Current work in AI alignment

Paul Christiano

Effective Altruism Global, April 3, 2020

Abstract

This talk explores the problem of AI alignment, specifically focusing on the concept of intent alignment: ensuring that AI systems are actually trying to do what humans want them to do. The speaker distinguishes between intent alignment and competence, arguing that while AI competency might improve with time, alignment is a separate challenge that will likely persist even as AI becomes more sophisticated. The speaker then discusses various approaches to reducing the ‘alignment tax,’ the cost incurred by insisting on AI systems that are aligned with human values. These approaches include designing algorithms that are inherently more easily aligned, and finding ways to transform existing algorithms to make them aligned without sacrificing their performance. The talk concludes by discussing the difference between outer alignment (designing objectives that incentivize aligned behavior) and inner alignment (ensuring that the AI actually pursues the intended objectives). The speaker argues that we need to move beyond the current paradigm of simply training AI on human data and find ways to build AI systems that can directly understand and pursue human values, even in cases where we lack a clear and complete understanding of those values. – AI-generated abstract

Current work in AI alignment

Abstract

PDF