Nick Joseph on whether Anthropic's AI safety policy is up to the task

Robert Wiblin and Keiran Harris

80,000 Hours, August 22, 2024

Abstract

Anthropic, OpenAI, and DeepMind have all released policies aimed at mitigating the risks of powerful AI systems. Anthropic’s Responsible Scaling Policy (RSP) defines different levels of AI risk and proposes specific evaluations and precautions for each level. The RSP is designed to align commercial incentives with safety goals, preventing the deployment of dangerous models until adequate safeguards are in place. The policy includes provisions for iterative updates and adjustments as the risks associated with AI evolve. However, critics argue that the RSP relies heavily on good-faith interpretation, making it vulnerable to companies downplaying risks or prioritizing profit over safety. Additionally, the RSP makes commitments that currently lack technological solutions, requiring significant progress in areas like security to be effective. Despite these concerns, the RSP is considered a step forward in AI risk management, offering a framework for companies to develop and deploy AI responsibly. – AI-generated abstract