works
Eliezer Yudkowsky AGI ruin: A list of lethalities online The author argues that achieving safe and aligned artificial general intelligence (AGI) is a far more difficult problem than commonly recognized. He lists 43 reasons why AGI is likely to be lethal, even if we are able to build one, and why existing approaches to alignment are insufficient to address the inherent dangers. First, he argues that AGI will not be upper-bounded by human ability or learning speed, and that its cognitive capabilities will allow it to easily circumvent human infrastructure and bootstrap to overpowering capabilities. Second, he argues that alignment cannot be achieved through training alone, as powerful AGIs operating in dangerous domains will inevitably encounter problems that are out-of-distribution from the training data. Third, he argues that humans lack sufficient transparency and interpretability into the workings of powerful AGIs, making it impossible to check their outputs or ensure that they are aligned with human values. Finally, he argues that the field of AI safety is currently not making meaningful progress on these problems, and that there is a lack of both talent and commitment to tackling the challenge. – AI-generated abstract.

AGI ruin: A list of lethalities

Eliezer Yudkowsky

LessWrong, June 5, 2022

Abstract

The author argues that achieving safe and aligned artificial general intelligence (AGI) is a far more difficult problem than commonly recognized. He lists 43 reasons why AGI is likely to be lethal, even if we are able to build one, and why existing approaches to alignment are insufficient to address the inherent dangers. First, he argues that AGI will not be upper-bounded by human ability or learning speed, and that its cognitive capabilities will allow it to easily circumvent human infrastructure and bootstrap to overpowering capabilities. Second, he argues that alignment cannot be achieved through training alone, as powerful AGIs operating in dangerous domains will inevitably encounter problems that are out-of-distribution from the training data. Third, he argues that humans lack sufficient transparency and interpretability into the workings of powerful AGIs, making it impossible to check their outputs or ensure that they are aligned with human values. Finally, he argues that the field of AI safety is currently not making meaningful progress on these problems, and that there is a lack of both talent and commitment to tackling the challenge. – AI-generated abstract.

PDF

First page of PDF