Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics
San Francisco, CA, 2010
Abstract
The field of machine ethics seeks methods to ensure that future intelligent machines will act in ways beneficial to human beings. Machine ethics is relevant to a wide range of possible artificial agents, but becomes especially difficult and especially important when the agents in question have at least human- level intelligence. This paper describes a solution, originally proposed by Yudkowsky (2004), to the problem of what goals to give such agents: rather than attempt to explicitly program in any specific normative theory (a project which would face numerous philosophical and immediate ethical difficulties), we should implement a system to discover what goals we would, upon reflection, want such agents to have. We discuss the motivations for and details of this approach, comparing it to other suggested methods for creating ‘artificial moral agents’ (Wallach & Collin 2007), and describe underspecified and uncertain areas for further research.
