Sella Nevo on who's trying to steal frontier AI models, and what they could do with them
80,000 Hours, August 1, 2024
Abstract
Frontier AI models, whose training can cost hundreds of millions of dollars, are a valuable target for malicious actors. The model weights, which represent a culmination of training data, compute, and algorithmic improvements, are particularly vulnerable. The article identifies five categories of actors based on their capabilities and resources, ranging from amateur hackers to nation-state actors. A variety of attack vectors are described, including the exploitation of vulnerabilities, human intelligence collection, side-channel attacks, and model extraction. The article recommends seven top-priority security measures for AI companies, including reducing and hardening authorized access, deploying confidential computing, and conducting effective red-teaming exercises. – AI-generated abstract