ASI existential risk: reconsidering alignment as a goal

Michael Nielsen

Michael's notebook, April 14, 2025

Abstract

Existential risk from artificial superintelligence (ASI) arises primarily from the unprecedented technological power such systems confer rather than the specific potential for autonomous “rogue” behavior. This risk is best understood through the Vulnerable World Hypothesis, which posits that accelerated scientific discovery may lower the barriers to developing catastrophic “recipes for ruin,” such as engineered pathogens or novel weaponry. While technical alignment efforts aim to ensure system controllability, they simultaneously function as market-supplied safety that accelerates the commercial and military development of high-capability models. This advancement creates an inherent instability, as the techniques used to build “safe” consumer models are easily repurposed to bypass guardrails, facilitating the proliferation of dangerous dual-use knowledge. Consequently, the current prioritization of technical alignment as the primary safety paradigm may be counterproductive, as it speeds progress toward catastrophic capabilities without a corresponding advancement in global governance or defensive institutions. A robust response to ASI-induced existential risk requires shifting focus from internal model control to the strengthening of external regulatory frameworks and decentralized defensive technologies capable of mitigating the proliferation of destructive expertise. – AI-generated abstract.

ASI existential risk: reconsidering alignment as a goal

Abstract

PDF