Why AI alignment could be hard with modern deep learning

Ajeya Cotra

Cold Takes, September 21, 2021

Abstract

The deep learning alignment problem is the problem of ensuring that advanced deep learning models don’t pursue dangerous goals. This article elaborates the “hiring” analogy to illustrate how alignment could be difficult if deep learning models are more capable than humans. It then explains in more technical detail what the deep learning alignment problem is. Finally, it discusses how difficult the alignment problem may be and how much risk there is from failing to solve it.

Why AI alignment could be hard with modern deep learning

Abstract

PDF