Punching Utilitarians in the Face

A fun game for avowed non-utilitarians is to invent increasingly exotic thought experiments to demonstrate the sheer absurdity of utilitarianism. Consider this bit from Tyler’s recent interview with SBF:

COWEN: Should a Benthamite be risk-neutral with regard to social welfare?

BANKMAN-FRIED: Yes, that I feel very strongly about.

COWEN: Okay, but let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?

BANKMAN-FRIED: Again, I feel compelled to say caveats here, like, “How do you really know that’s what’s happening?” Blah, blah, blah, whatever. But that aside, take the pure hypothetical.

COWEN: Then you keep on playing the game. So, what’s the chance we’re left with anything? Don’t I just St. Petersburg paradox you into nonexistence?

Pretty damning! It sure sounds pretty naive to just take any bet with positive expected value. Or from a more academic context, here is FTX Foundation CEO Nick Beckstead alongside Teruji Thomas:

On your deathbed, God brings good news… he’ll give you a ticket that can be handed to the reaper, good for an additional year of happy life on Earth.

As you celebrate, the devil appears and asks “Won’t you accept a small risk to get something vastly better? Trade that ticket for this one: it’s good for 10 years of happy life, but with probability 0.999.”

You accept… but then the devil asks again… “Trade that ticket for this one: it is good for 100 years of happy life–10 times as long–with probability 0.9992–just 0.1% lower.”

An hour later, you’ve made 50,000 trades… You find yourself with a ticket for 1050,000 years of happy life that only works with probability 0.99950,000, less than one chance in 1021

Predictably, you die that very night.

And it’s not just risk! There are damning scenarios downright disproving utilitarianism around every corner. Joe Carlsmith:

Suppose that oops: actually, red’s payout is just a single, barely-conscious, slightly-happy lizard, floating for eternity in space. For a sufficiently utilitarian-ish infinite fanatic, it makes no difference. Burn the Utopia. Torture the kittens.

…in the land of the infinite, the bullet-biting utilitarian train runs out of track…

It’s looking quite bad for utilitarianism at this point. But of course, one man’s modus ponens is another man’s modus tollens, and so I submit to you that actually, it is the thought experiments which are damned by all this.

I take the case for “common sense ethics” seriously, meaning that a correct ethical system should, for the most part, advocate for things in a way that lines up with what people actually feel and believe is right.

But if your entire argument against utilitarianism is based on ginormous numbers, tiny probabilities, literal eternities and other such nonsense, you are no longer on the side of moral intuitionism. Rather, your arguments are wildly unintuitive, your “thought experiments” literally unimaginable, and each “intuition pump” overtly designed to take advantage of known cognitive failures.

The real problem isn’t even that these scenarios are too exotic, it’s that coming up with them is trivial, and thus proves nothing. Consider, with apologies to Derek Parfit:

Suppose that I am driving at midnight through some desert. My car breaks down. You are a stranger, and the only other driver near. I manage to stop you, and I ask for help.

As you are against utilitarianism, you have committed to the following doctrine: when a stranger asks for help at midnight in the desert, you will give them the help they need free of charge. Unless they are a utilitarian, in which case you will punch them in the face, light them on fire, and commit to spending the rest of your life sabotaging shipments of anti-malarial betnets.

Here is a case without any outlandish numbers in which being a utilitarian does not result in the best outcome. And yet clearly, it proves nothing at all about utilitarianism!

Look, I know this all sounds silly, but it is no sillier than Newcomb’s Paradox. As a brief reminder:

The player is given a choice between taking only box B, or taking both boxes A and B.

  • Box A is transparent and always contains a visible $1,000.
  • Box B is opaque, and its content has already been set by the predictor.

If the predictor has predicted that the player will take only box B, then box B contains $1,000,000. If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.

Again, this initial looks pretty damning for standard decision theory …except that you can generate a similar “experiment” to argue against anything you don’t like. In fact, you can generate far worse ones! Consider:

The player is given a choice between only taking box B, or taking both boxes A and B.

  • Box A is transparent and always contains $1,000.
  • Box B is opaque, and its content has already been set by the predictor.

If the predictor has predicted the player acts in accordance with Theory I Like, Box B contains $1,000,000. If the predictor has predicted the player acts in accordance with Theory I Don’t Like, then box B contains a quadrillion negative QALYs.

The problem isn’t that decision theory is wrong, it’s that the setup has been designed to punish people who behave a certain way. And so it’s meaningless because we can trivially generate analogous setups that punish any arbitrary group of people, thus “disproving” their belief system, or normative theory, or whatever it is you’re trying to argue against… while at the same time providing no actual evidence one way or another.

Does this mean thought experiments are all useless and we just have to do moral philosophy entirely a priori? Not at all. But there are two particular cases where these fail, and a suspiciously large number of the popular experiments fall into at least one of them:

  1. The “moral intuition” is clearly not generated by reliable intuitions because it abuses:
    a. Incomprehensibly large or small numbers
    b. Known cognitive biases
    c. Wildly unintuitive premises
  2. The “moral intuition” proves too much because it can be trivial deployed against any arbitrary theory

In contrast, the best thought experiments are less like clubs beating you over the head, and more like poetry that highlights a playful tension between conflicting reasons. In this vein, Philippa Foot’s Trolley Problems are so lovely because they elegantly guide you around the contours of your own values. They allow you to parse out various objections, to better understand which particular aspects of an action make it objectionable, and play your own judgements against each other in a way that generates humility, thoughtfulness and comprehension.

So I love thought experiments. And I deeply appreciate the way make-believe scenarios can teach us about the real world. I just don’t care for getting punched in the face.


Nicolaus Bernoulli, Joe Carlsmith, Nick Beckstead, Teruji Thomas, Derek Parfit, Tyler Cowen, and Robert Nozick are all perfectly fine people and good moral philosophers.

I am also not a moral philosopher myself, and it’s likely that I’m missing something important.

Having said that, I will do the public service of risking embarassment to make my bullet biting explicit:

  • I take the St. Petersburg gamble, and accept that a 0.5n probability of 2nx value is positive-EV.
  • I also take the devil’s deal.
  • I simply don’t believe that infinities exist, and even though 0 isn’t a probability, I reject the probabilistic argument that any possibility of infinity allows them to dominate all EV calculations. I just don’t think the argument is coherent, at least not in the formulations I’ve seen.
  • Similarly, once you introduce a “reliable predictor”, everything goes out the window and the money is the least of your concern. But granting the premise, fine, I One Box.

EDIT: I didn’t discuss it here, but the original desert dillema just involves you being a selfish person who can’t lie, and then man refusing to help you because he knows you won’t actually reward him. This doesn’t fall into either of the “bad thought experiment” heuristics I outlined above, and is in fact, a seemingly reasonable scenario.

But I don’t think the lesson is “selfishness is always self-defeating”, I think the lesson is “if you’re unable to lie, having a policy of acting selfish is probably the wrong way to implement your selfish aims.” And so you should rationally determine to act irrationally (with respect to your short-term aims), but this is really no different than any other short-term/long-term tradeoff.

Parfit’s point, by the way, was a more abstract thing about the fact that some “policies” can be self-defeating, and that this results in some theoretically interesting claims. Which is good and clevel, but for our purposes my point is that the “Argument from getting punched in the face by an AI that hates your policy in particular” does a good job of demonstrating this doesn’t prove anything about any given policy in particular.