My overview of the AI alignment landscape

Neel Nanda

AI Alignment Forum, December 16, 2021

Abstract

The reformulation of values into a new ontology may converge or diverge. For example, an egoist who values only their pleasure, and a hedonic utilitarian can end up valuing different aspects of the same underlying phenomenon. Similarly, two deontologists who value non-coercion may reformulate their values differently when considering internal coercion. Over time, ontological improvements can significantly alter our values and may even lead to convergence or divergence among different conceptions of morality. – AI-generated abstract.

My overview of the AI alignment landscape

Abstract

PDF