Impossibility and Uncertainty Theorems in AI Value Alignment

About this paper

Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function), Peter Eckersley. Submitted to the ArXiV on December 31, 2018.


Ethical considerations in artificial intelligence applications have arguably been present since the birth of the field, if not earlier. Karel Čapek wrote R.U.R., the play that introduced both the word “robot” to our lexicon, and the question of whether such creations would want to work. Isaac Asimov and Mary Shelley both asked how a society with “natural” and “artificial” people would function.

Getting Trollied

More recently, as we’ve been able to create things that we can both identify as solving problems and market under the term Artificial Intelligence, the question of AI ethics has reappeared with different specialisations. Should we apply AI techniques to particular fields of endeavour at all, such as autonomous weaponry? Who should be responsible for a decision made, for an example, by an autonomous vehicle? How should such a vehicle make a decision when faced by the classic trolley problem?

Eckersley opens this paper with an aside, important to those who consider AI as an engineering discipline (which is not a clear statement in itself): autonomous vehicle research is a long, long way away from even contemplating the trolley problem. Currently, reducing the uncertainty with which obstacles are detected and identified from noisy input signals is a much more pressing issue.

However, human drivers are better at object identification, so may be sufficiently advanced to address the trolley problem should it arise in piloting a vehicle. And this is really an important point in thinking philosophically about how AIs should act: we already have intelligent actors, and already think about how they act.


Indeed, we already have artificial intelligences, including society-level AIs. A government bureaucracy; an enterprise; a policy document; all of these are abstractions of human intelligence behind machines, rules, and assumptions of how to work in given contexts. Any time that someone relies on external input to make a decision, whether that input is a checklist, a company handbook, a ready reckoner tool or a complex software-built decision support tool, you could replace that tool with an AI system with no change in the ethics of the situation.

Anywhere you could consider an AI making choices that are suboptimal to the welfare of the affected people, you could imagine those choices being suggested by the checklist or ready reckoner. None of this is to minimise the impact or applicability of AI ethics, quite the contrary: if we are only considering these problems now because we think AI has become useful, we are late to the party. Very late: what was the Soviet Gosplan other than a big decision support system trying to optimise a society along utilitarian lines?

The early sections of this paper summarise earlier proofs that it is impossible to define an objective utility function that will optimise for multiple, conflicting output dimensions. Not because it is hard to discover or compute the function, but because an acceptable trade-off may not exist. The proofs take this form:

  • imagine a guess at a solution, s.
  • there is a rule r1 which, applied to s, yields a better solution, s1.
  • there is a rule r2 which, applied to s1, yields a better solution, s2.
  • there is a rule rN which, applied to s(n-1), yields a better solution, s.

In other words, the different states that represent solutions cannot be “totally ordered” from “best” to “worst”; a cycle exists such that the question “which is the best” is paradoxical. While that has clear impact on AIs that use objective utility functions to choose solutions to their inputs, the wider implication is that no utilitarian system can objectively choose the “best” arrangement for welfare.

If you’re building the multivac from Isaac Asimov’s Franchise, then I’m going to have to disappoint you now: it’s impossible. But so is having a room full of people with slide rules and 383 different folders containing next year’s production targets. It’s not building an AI to optimise for the utility function that cannot be done; it’s finding a maximum on the utility function.

Finding a Workaround

A few techniques for addressing this problem are discussed and rejected: we could ignore the problem on the basis that our AIs are not currently important enough to have huge impact on social welfare. But AI is being used in welfare systems, in criminal justice systems, in financial management. And even where AI is not used, corporations and bureaucracies are already being applied, and these will have their decision support tools, which will eventually include AI.

Similarly, simplifying the problem is possible, but undesirable. You could define some kind of moral exchange rate to reduce the dimensionality of your problem: this much wealth in retirement is worth this much poverty today; this much social equality can be traded for this much value of statistical life. Or you could accept one or more of the undesirable properties of a solution as axiomatic; we can solve for inequality if we accept poverty, perhaps. Neither of these are ethically unambiguous.

Embracing Uncertainty

Ultimately, Eckersley concludes that the only feasible approach is to relax the requirement for objective, total ordering of the possible outcomes. One way is to introduce partial ordering of at least two of the comparison dimensions: to go from saying that a solution is { better than, worse than, the same as } another along some axis to saying it is { better than, worse than, the same as, incomparable to } the other solution. While this breaks the deadlock it’s difficult to work with, and domain-specific interpretations of partial ordering will be needed. And now we’re back at the trolley problem: there are multiple outcomes to choose from, we don’t know (because it’s impossible to say) which is better, and yet a decision must be taken. Which?

A second model replaces the definite ordering of solutions with probabilistic ordering. Rather than the impossible statement “outcome B is better than outcome A”, the pair of statements “outcome B has probability p of being better than outcome A” and “outcome A has probability (1-p) of being better than outcome B” are given. Now a system can find the outcome with the highest likelihood of being best, or pick from some with sufficient likelihood of being best, even though it is still impossible to find the best. There will always be some chance of violating a constraint, but those should at least be free of systematic bias.


If your AI strategy is based on the idea that your domain can be “solved numerically”, by finding some utility function or by training an AI to approximate the utility function, then you need to ask yourself whether this is true without introducing some biases or unethical choices into the solutions. Consider whether your system should find what is “best” and do it, or find actions that are “good enough” and either choose between them, or present them for human supervision.