A tale of 2.5 orthogonality theses
Effective Altruism Forum, May 1, 2022
Abstract
You can summarise this whole post as ‘we shouldn’t confuse theoretical possibility with likelihood, let alone with theoretical certainty’. I’m concerned that EA AI-advocates tend to equivocate between two or even three different forms of the orthogonality thesis using a motte and bailey argument, and that this is encouraged by misleading language in the two seminal papers. The motte (the trivially defensible position) is the claim that it is theoretically possible to pair almost any motivation set with high intelligence and that AI will therefore not necessarily be benign or human-friendly. The inner bailey (a nontrivial but plausible position with which it’s equivocated) is the claim that there’s a substantial chance that AI will be unfriendly and non-benign, and that caution is wise until we can be very confident that it won’t. The outer bailey (a still less defensible position with which both are also equivocated) is the claim that we should expect almost no relationship, if any, between intelligence and motivations, and therefore that AI alignment is extremely unlikely. This switcheroo overemphasises the chance of hostile AI, and so might be causing us to overemphasise the priority of AI work.
