works
Magnus Vinding Is AI Alignment Possible? online The AI alignment problem, often defined as making advanced AI fulfill human desires, suffers from two key issues. First, human values are not monolithic; substantial, often irreconcilable, disagreements exist on fundamental ethical questions, precluding any single AI goal function that aligns with all human preferences. Second, even focusing on a single individual’s preferences, accurately predicting their desired actions in future, potentially novel scenarios is highly problematic. Human value systems, necessarily simplified representations of a complex world, lack the information to offer definitive solutions for all future moral dilemmas. These limitations become more pronounced when extrapolating values to unfamiliar future contexts. Therefore, AI alignment, rather than aiming for perfect value replication, should focus on implementing broadly acceptable safety measures and mechanisms that reflect a compromise between diverse human values, acknowledging the inherent imprecision in long-term value extrapolation. – AI-generated abstract.

Is AI Alignment Possible?

Magnus Vinding

Magnus Vinding, December 14, 2018

Abstract

The AI alignment problem, often defined as making advanced AI fulfill human desires, suffers from two key issues. First, human values are not monolithic; substantial, often irreconcilable, disagreements exist on fundamental ethical questions, precluding any single AI goal function that aligns with all human preferences. Second, even focusing on a single individual’s preferences, accurately predicting their desired actions in future, potentially novel scenarios is highly problematic. Human value systems, necessarily simplified representations of a complex world, lack the information to offer definitive solutions for all future moral dilemmas. These limitations become more pronounced when extrapolating values to unfamiliar future contexts. Therefore, AI alignment, rather than aiming for perfect value replication, should focus on implementing broadly acceptable safety measures and mechanisms that reflect a compromise between diverse human values, acknowledging the inherent imprecision in long-term value extrapolation. – AI-generated abstract.

PDF

First page of PDF