ML systems will have weird failure modes

Jacob Steinhardt

Bounded Regret, January 26, 2022

Abstract

Previously, I’ve argued that future ML systems might exhibit unfamiliar, emergent capabilities, and that thought experiments provide one approach towards predicting these capabilities and their consequences. In this post I’ll describe a particular thought experiment in detail. We’ll see that taking thought experiments seriously often surfaces future risks

ML systems will have weird failure modes

Abstract

PDF