works
Ryan Greenblatt Plans A, B, C, and D for misalignment risk online This article outlines a framework of five strategic plans (A-E) for mitigating AI misalignment risk, characterized by decreasing levels of required political will and corresponding lead times for safety work. Plan A, demanding strong international cooperation, envisions a 10-year period for extensive safety investment and a slow, controlled AI takeoff. Plan B involves 1-3 years of lead time secured by the US government for focused safety efforts. Plan C relies on a leading AI company utilizing its 2-9 month lead for misalignment work, aiming for rapid, albeit potentially “janky,” AI handoff. Plan D describes a scenario with minimal institutional buy-in, where a small internal team allocates limited compute to risk mitigation, focusing on extracting research and preparing for a plausibly safe handoff. Plan E represents a near-absence of dedicated effort, necessitating a focus on increasing political will. Associated with these plans are estimated AI takeover risks ranging from 7% (Plan A) to 75% (Plan E). The analysis of these probabilities and risk levels suggests that efforts should primarily target advancing Plans C and D, emphasizing the critical role of AI company personnel and leadership in addressing existential safety. – AI-generated abstract.

Plans A, B, C, and D for misalignment risk

Ryan Greenblatt

AI Alignment Forum, October 7, 2025

Abstract

This article outlines a framework of five strategic plans (A-E) for mitigating AI misalignment risk, characterized by decreasing levels of required political will and corresponding lead times for safety work. Plan A, demanding strong international cooperation, envisions a 10-year period for extensive safety investment and a slow, controlled AI takeoff. Plan B involves 1-3 years of lead time secured by the US government for focused safety efforts. Plan C relies on a leading AI company utilizing its 2-9 month lead for misalignment work, aiming for rapid, albeit potentially “janky,” AI handoff. Plan D describes a scenario with minimal institutional buy-in, where a small internal team allocates limited compute to risk mitigation, focusing on extracting research and preparing for a plausibly safe handoff. Plan E represents a near-absence of dedicated effort, necessitating a focus on increasing political will. Associated with these plans are estimated AI takeover risks ranging from 7% (Plan A) to 75% (Plan E). The analysis of these probabilities and risk levels suggests that efforts should primarily target advancing Plans C and D, emphasizing the critical role of AI company personnel and leadership in addressing existential safety. – AI-generated abstract.

PDF

First page of PDF