works
Evan Hubinger Clarifying Inner Alignment Terminology online The concepts of outer and inner alignment are essential for ensuring that an AI system’s behavior aligns with human values. This article clarifies the various definitions and implications of these concepts. It introduces a diagram illustrating the relationship between different alignment types and provides formal definitions for each term. Additionally, the article addresses frequently asked questions to further elucidate the relationships among the concepts. – AI-generated abstract.

Clarifying Inner Alignment Terminology

Evan Hubinger

AI Alignment Forum, November 9, 2020

Abstract

The concepts of outer and inner alignment are essential for ensuring that an AI system’s behavior aligns with human values. This article clarifies the various definitions and implications of these concepts. It introduces a diagram illustrating the relationship between different alignment types and provides formal definitions for each term. Additionally, the article addresses frequently asked questions to further elucidate the relationships among the concepts. – AI-generated abstract.

PDF

First page of PDF