works
Holden Karnofsky AI safety seems hard to measure online Four analogies for why “We don’t see any misbehavior by this AI” isn’t enough.

AI safety seems hard to measure

Holden Karnofsky

Cold Takes, December 8, 2022

Abstract

Four analogies for why “We don’t see any misbehavior by this AI” isn’t enough.

PDF

First page of PDF