Why AI alignment could be hard with modern deep learning
Cold Takes, September 21, 2021
Abstract
The deep learning alignment problem is the problem of ensuring that advanced deep learning models don’t pursue dangerous goals. This article elaborates the “hiring” analogy to illustrate how alignment could be difficult if deep learning models are more capable than humans. It then explains in more technical detail what the deep learning alignment problem is. Finally, it discusses how difficult the alignment problem may be and how much risk there is from failing to solve it.
