Practically-a-book review: Yudkowsky contra Ngo on agents

Scott Alexander

Astral Codex Ten, January 18, 2022

Abstract

Eliezer Yudkowsky and Richard Ngo, two well-known figures in the field of AI safety, engage in a lively debate about the nature of artificial intelligence. Yudkowsky argues that a superintelligent AI is likely to emerge in the near future and that such an AI could pose an existential threat to humanity. Ngo counters that focusing on “tool AIs”, capable of performing specific tasks without general intelligence, might be a safer approach. He suggests that AIs can be designed to reason about hypothetical situations without developing agency, thereby mitigating the risks associated with goal-directed behavior. However, Yudkowsky remains skeptical of such approaches, arguing that the distinction between “tool” and “agent” is increasingly blurred as AIs become more sophisticated. The conversation delves into the complex nature of agency and intelligence, drawing parallels between AI and biological systems such as cats, and exploring the intricacies of human morality. Yudkowsky expresses deep concern about the potential for accidentally creating a malevolent AI by pursuing the seemingly harmless goal of developing “really effective minds that do stuff for us.” The debate underscores the challenges inherent in ensuring AI alignment and highlights the need for further research in the field. – AI-generated abstract.

Practically-a-book review: Yudkowsky contra Ngo on agents

Abstract

PDF