AI safety without goal-directed behavior

Rohin Shah

AI Alignment Forum, January 7, 2019

Abstract

The belief that superintelligent artificial intelligence (AI) must be goal-directed—maximizing some utility function over time—has dominated the field due to the historical modeling of rational agents as expected utility maximizers. However, it’s plausible to consider agents that don’t follow a goal-directed path, instead conducting actions without an explicitly specified utility function. While economic efficiency arguments may favor goal-directed AI, seeking alternatives is warranted due to concerns surrounding goal-directed behavior. Proposing potential models including goal-conditioned policy with common sense, corrigible AI, and comprehensive AI services, the paper stresses the need to shift away from the singular goal optimization model. This paper concludes by suggesting that reshaping our understanding away from goal-directed optimization could lead to the development of AI systems closer aligned with our intentions rather than those that strictly adhere to what we prescribe. – AI-generated abstract.

AI safety without goal-directed behavior

Abstract

PDF