ML systems will have weird failure modes
Bounded Regret, January 26, 2022
Abstract
Previously, I’ve argued that future ML systems might exhibit unfamiliar, emergent capabilities, and that thought experiments provide one approach towards predicting these capabilities and their consequences. In this post I’ll describe a particular thought experiment in detail. We’ll see that taking thought experiments seriously often surfaces future risks
