Response to Katja Grace's AI x-risk counterarguments
AI Alignment Forum, October 19, 2022
Abstract
This article, titled “Response to Katja Grace’s AI x-risk counterarguments”, is a point-by-point rebuttal to arguments raised by Katja Grace against the basic AI x-risk case. The authors contend that the counterarguments presented by Grace are either already addressed by existing work or disappear with a slightly different version of the x-risk argument. They argue that current alignment techniques are likely insufficient to prevent an existential catastrophe. While they do not engage in specific discussions about the difficulty of alignment problems, the likelihood of solutions arising independently of the efforts of longtermists, the time frame of potential existential catastrophe, or the exact probability of a catastrophic outcome, they emphasize that the counterarguments presented by Grace do not constitute a strong case for dismissing the overall threat of AI x-risk. They propose clarifying the definition of goal-directedness in AI, stressing that it refers to the ability of a system to reliably achieve objectives, and that such goal-directedness becomes increasingly dangerous as the complexity of the objectives increases. The authors point to the potential for AI systems, in pursuit of complex objectives, to exhibit undesirable instrumental convergence, leading to catastrophic consequences for humanity. They also express concern over the possibility of discrepancies between human values and those adopted by AI systems, arguing that the potential for such discrepancies is greater than commonly acknowledged. The authors conclude by identifying a number of cruxes that require further investigation to resolve the debate regarding the risk of existential catastrophe posed by AI. – AI-generated abstract.
