Ten people on the inside

Buck Shlegeris

AI Alignment Forum, January 27, 2025

Abstract

The article discusses various regimes for AI misalignment risk mitigation, ranging from an ideal “safety case” scenario to more probable, pessimistic situations involving “rushed unreasonable developers.” It focuses on a scenario within an AI company where ten safety-concerned individuals operate with limited political will, budget, and influence, attempting to mitigate risks despite the company’s general disregard for specific misalignment threats. For these internal agents, interventions must be cheap, require minimal compute, and impose low compliance overhead to avoid being reversed. Proposed actions include gathering evidence of risk, implementing direct safety measures, and conducting alignment research. The author posits that such groups, even with minimal resources, could substantially reduce risk, highlighting the need for planning exportable safety research and external assistance, and advocating for current research to prioritize low-budget safety techniques applicable in these constrained environments. – AI-generated abstract.

Ten people on the inside

Abstract

PDF