Brian Christian on the alignment problem

Robert Wiblin and Keiran Harris

80,000 Hours, March 5, 2021

Abstract

“People would say ‘call me when AI can do X.’ And now it can do X.”. Brian Christian, an expert in communicating complex ideas in mathematics and computer science, discusses his latest book, “The Alignment Problem,” in an engaging podcast episode. He delves into the fascinating world of reinforcement learning, exploring how agents learn from experiences and the challenges they face. Christian highlights intriguing stories, including the development of self-driving vehicles in the early 1990s and the curious case of agents optimizing for proxies instead of intended goals. He also discusses the concept of knowledge-seeking agents as a potential solution to self-deception and emphasizes the importance of inverse reinforcement learning and inverse reward design in understanding human behavior and guiding AI systems. Christian’s alignment with Dario Amodei’s views on the interconnected nature of technical AI safety and fairness is a testament to his insightful perspective on the field. – AI-generated abstract.

Brian Christian on the alignment problem

Abstract

PDF