Many arguments for AI x-risk are wrong — LessWrong
LessWrong, March 5, 2024
Abstract
The work does not contain an abstract. Here is my generated abstract:
Many prominent arguments for AI existential risk are based on fundamentally confused ideas and invalid reasoning. The counting argument for deceptively aligned AI systems provides approximately zero evidence that pretraining and reinforcement learning from human feedback will eventually become intrinsically unsafe. Early foundational texts in AI alignment, like Bostrom’s Superintelligence, made crucial errors in understanding reinforcement learning that led to misdirected research efforts. In modern reinforcement learning approaches, “reward” functions serve as tools to control parameter updates rather than creating systems that explicitly seek to maximize rewards. While certain existential risks from AI remain concerning - including purposeful creation of agentic systems, automation of economic decision-making, state actor misuse, and centralization of power - there is insufficient evidence that future large language models will autonomously constitute an existential risk without being specifically prompted toward large-scale tasks. The implications for AI governance and regulation are significant: experimental feedback loops can be relied upon more heavily, worst-case interpretability becomes less crucial, and AI systems can be more readily used as tools that execute requested tasks. - AI-generated abstract
