AGI safety from first principles: goals and agency

Richard Ngo

AI Alignment Forum, September 29, 2020

Abstract

This article investigates the concept of agency in artificial intelligence, exploring the possibility that advanced AI systems may exhibit goal-directed behavior similar to humans. The author distinguishes between design objectives and the goals an agent itself wants to achieve, arguing that current frameworks for understanding agency (such as expected utility maximization or the intentional stance) are insufficient to capture the nuances of goal-directed behavior. Instead, the author proposes a new framework based on six specific cognitive abilities: self-awareness, planning, consequentialism, scale, coherence, and flexibility. The author then discusses the likelihood of developing highly agentic AI, arguing that the training regime used to develop AI systems will determine the extent to which they acquire these agentic traits. Finally, the author examines how goals can generalize to larger scales and explores the implications of agency for collective AI systems. – AI-generated abstract.

AGI safety from first principles: goals and agency

Abstract

PDF