A framework for thinking about how to make AI go well

Rohin Shah

LessWrong, April 15, 2020

Abstract

This newsletter issue explores recent work in AI alignment, focusing on frameworks for thinking about how to make AI go well and on specific research developments. It starts by summarizing a talk by Paul Christiano on decomposing the problem of beneficial AI, outlining a hierarchy of considerations from competence and alignment to coping with AI’s impact. The issue then highlights work on iterated amplification, including a paper on unsupervised question decomposition for question answering. Research on agent foundations is explored through an orthodox case against utility functions, advocating for subjective utility functions defined over events. A new method called Neuron Shapley is presented for measuring the importance of neurons in determining a neural network’s output. This method allows for model surgery, enabling targeted removal of neurons responsible for specific behaviors. The issue also covers advances in AI forecasting, including DeepMind’s Agent57, which surpasses human performance on a suite of Atari games. Finally, advancements in reinforcement learning, deep learning, and news related to AI safety are discussed. – AI-generated abstract.

A framework for thinking about how to make AI go well

Abstract

PDF