Worlds where we solve AI alignment on purpose don't look like the world we live in

Michael Dickens

Effective Altruism Forum, March 19, 2026

Abstract

Current efforts to ensure the safety of superintelligent AI lack the institutional rigor and technical gravity required to mitigate existential risk, suggesting a probability of human extinction of at least 25% on the present trajectory. Unlike high-stakes engineering precedents such as the Apollo program, contemporary AI development is marked by a significant resource imbalance, with capabilities research receiving approximately 100 times the investment allocated to alignment. Frontier AI labs frequently demonstrate poor performance on safety evaluations, lobby against substantive regulation, and rely on non-binding commitments that are often retracted during periods of rapid development. Technical approaches to alignment are currently hindered by fallacious reasoning, such as equating a lack of evidence for model deception with proof of safety. Furthermore, the industry’s reliance on using nascent AI systems to solve the alignment problem indicates a failure of human-led oversight. Organizational incentives further compound these risks by systematically marginalizing pessimistic viewpoints and favoring reckless optimism in leadership roles. Averting a catastrophic outcome necessitates a shift toward safety standards equivalent to those in aerospace or cryptography, alongside a deeper engagement with technical philosophy. Without such structural changes, any successful alignment of superintelligent systems would result from chance rather than deliberate civilizational effort. – AI-generated abstract.

Worlds where we solve AI alignment on purpose don't look like the world we live in

Abstract

PDF