Planning for Extreme AI Risks

Josh Clymer

AI Alignment Forum, January 29, 2025

Abstract

The work does not contain an abstract. Here is my generated abstract:

A responsible AI developer named “Magma” should follow specific prioritization heuristics to minimize extreme AI risks, particularly before achieving meaningful AI software R&D acceleration. These heuristics include scaling AI capabilities aggressively, focusing safety resources primarily on preparation rather than addressing current risks, and concentrating preparation efforts on raising risk awareness, preparing to elicit safety research from AI systems, and implementing extreme security measures. The analysis explores three potential outcomes: human researcher obsolescence through AI automation, a coordinated industry-wide pause in development, or voluntary self-destruction to reduce competitive pressure. After achieving significant AI R&D acceleration, the choice between these outcomes depends on complex factors like coordination potential and relative capabilities compared to competitors. The proposed framework assumes short timelines to automated AI development, plausible rapid software-only improvement to superhuman AI, and uncertainty about safety difficulty. While acknowledging the challenges of planning under uncertainty, early preparation across multiple dimensions - including security, safety distribution, governance and communication - is deemed necessary given the potentially catastrophic risks of unaligned superhuman AI systems. - AI-generated abstract

Planning for Extreme AI Risks

Abstract

PDF