Punching Utilitarians in the Face

A fun game for avowed non-utilitarians is to invent increasingly exotic thought experiments to demonstrate the sheer absurdity of utilitarianism. Consider this bit from Tyler’s recent interview with SBF:

COWEN: Should a Benthamite be risk-neutral with regard to social welfare?

BANKMAN-FRIED: Yes, that I feel very strongly about.

COWEN: Okay, but let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?

BANKMAN-FRIED: Again, I feel compelled to say caveats here, like, “How do you really know that’s what’s happening?” Blah, blah, blah, whatever. But that aside, take the pure hypothetical.

COWEN: Then you keep on playing the game. So, what’s the chance we’re left with anything? Don’t I just St. Petersburg paradox you into nonexistence?

Pretty damning! It sure sounds pretty naive to just take any bet with positive expected value. Or from a more academic context, here is FTX Foundation CEO Nick Beckstead alongside Teruji Thomas:

On your deathbed, God brings good news… he’ll give you a ticket that can be handed to the reaper, good for an additional year of happy life on Earth.

As you celebrate, the devil appears and asks “Won’t you accept a small risk to get something vastly better? Trade that ticket for this one: it’s good for 10 years of happy life, but with probability 0.999.”

You accept… but then the devil asks again… “Trade that ticket for this one: it is good for 100 years of happy life–10 times as long–with probability 0.9992–just 0.1% lower.”

An hour later, you’ve made 50,000 trades… You find yourself with a ticket for 1050,000 years of happy life that only works with probability 0.99950,000, less than one chance in 1021

Predictably, you die that very night.

And it’s not just risk! There are damning scenarios downright disproving utilitarianism around every corner. Joe Carlsmith:

Suppose that oops: actually, red’s payout is just a single, barely-conscious, slightly-happy lizard, floating for eternity in space. For a sufficiently utilitarian-ish infinite fanatic, it makes no difference. Burn the Utopia. Torture the kittens.

…in the land of the infinite, the bullet-biting utilitarian train runs out of track…

It’s looking quite bad for utilitarianism at this point. But of course, one man’s modus ponens is another man’s modus tollens, and so I submit to you that actually, it is the thought experiments which are damned by all this.

I take the case for “common sense ethics” seriously, meaning that a correct ethical system should, for the most part, advocate for things in a way that lines up with what people actually feel and believe is right.

But if your entire argument against utilitarianism is based on ginormous numbers, tiny probabilities, literal eternities and other such nonsense, you are no longer on the side of moral intuitionism. Rather, your arguments are wildly unintuitive, your “thought experiments” literally unimaginable, and each “intuition pump” overtly designed to take advantage of known cognitive failures.

The real problem isn’t even that these scenarios are too exotic, it’s that coming up with them is trivial, and thus proves nothing. Consider, with apologies to Derek Parfit:

Suppose that I am driving at midnight through some desert. My car breaks down. You are a stranger, and the only other driver near. I manage to stop you, and I ask for help.

As you are against utilitarianism, you have committed to the following doctrine: when a stranger asks for help at midnight in the desert, you will give them the help they need free of charge. Unless they are a utilitarian, in which case you will punch them in the face, light them on fire, and commit to spending the rest of your life sabotaging shipments of anti-malarial betnets.

Here is a case without any outlandish numbers in which being a utilitarian does not result in the best outcome. And yet clearly, it proves nothing at all about utilitarianism!

Look, I know this all sounds silly, but it is no sillier than Newcomb’s Paradox. As a brief reminder:

The player is given a choice between taking only box B, or taking both boxes A and B.

  • Box A is transparent and always contains a visible $1,000.
  • Box B is opaque, and its content has already been set by the predictor.

If the predictor has predicted that the player will take only box B, then box B contains $1,000,000. If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.

Again, this initial looks pretty damning for standard decision theory …except that you can generate a similar “experiment” to argue against anything you don’t like. In fact, you can generate far worse ones! Consider:

The player is given a choice between only taking box B, or taking both boxes A and B.

  • Box A is transparent and always contains $1,000.
  • Box B is opaque, and its content has already been set by the predictor.

If the predictor has predicted the player acts in accordance with Theory I Like, Box B contains $1,000,000. If the predictor has predicted the player acts in accordance with Theory I Don’t Like, then box B contains a quadrillion negative QALYs.

The problem isn’t that decision theory is wrong, it’s that the setup has been designed to punish people who behave a certain way. And so it’s meaningless because we can trivially generate analogous setups that punish any arbitrary group of people, thus “disproving” their belief system, or normative theory, or whatever it is you’re trying to argue against… while at the same time providing no actual evidence one way or another.

Does this mean thought experiments are all useless and we just have to do moral philosophy entirely a priori? Not at all. But there are two particular cases where these fail, and a suspiciously large number of the popular experiments fall into at least one of them:

  1. The “moral intuition” is clearly not generated by reliable intuitions because it abuses:
    a. Incomprehensibly large or small numbers
    b. Known cognitive biases
    c. Wildly unintuitive premises
  2. The “moral intuition” proves too much because it can be trivial deployed against any arbitrary theory

In contrast, the best thought experiments are less like clubs beating you over the head, and more like poetry that highlights a playful tension between conflicting reasons. In this vein, Philippa Foot’s Trolley Problems are so lovely because they elegantly guide you around the contours of your own values. They allow you to parse out various objections, to better understand which particular aspects of an action make it objectionable, and play your own judgements against each other in a way that generates humility, thoughtfulness and comprehension.

So I love thought experiments. And I deeply appreciate the way make-believe scenarios can teach us about the real world. I just don’t care for getting punched in the face.


Nicolaus Bernoulli, Joe Carlsmith, Nick Beckstead, Teruji Thomas, Derek Parfit, Tyler Cowen, and Robert Nozick are all perfectly fine people and good moral philosophers.

I am also not a moral philosopher myself, and it’s likely that I’m missing something important.

Having said that, I will do the public service of risking embarassment to make my bullet biting explicit:

  • I take the St. Petersburg gamble, and accept that a 0.5n probability of 2nx value is positive-EV.
  • I also take the devil’s deal.
  • I simply don’t believe that infinities exist, and even though 0 isn’t a probability, I reject the probabilistic argument that any possibility of infinity allows them to dominate all EV calculations. I just don’t think the argument is coherent, at least not in the formulations I’ve seen.
  • Similarly, once you introduce a “reliable predictor”, everything goes out the window and the money is the least of your concern. But granting the premise, fine, I One Box.

EDIT: I didn’t discuss it here, but the original desert dillema just involves you being a selfish person who can’t lie, and then man refusing to help you because he knows you won’t actually reward him. This doesn’t fall into either of the “bad thought experiment” heuristics I outlined above, and is in fact, a seemingly reasonable scenario.

But I don’t think the lesson is “selfishness is always self-defeating”, I think the lesson is “if you’re unable to lie, having a policy of acting selfish is probably the wrong way to implement your selfish aims.” And so you should rationally determine to act irrationally (with respect to your short-term aims), but this is really no different than any other short-term/long-term tradeoff.

Parfit’s point, by the way, was a more abstract thing about the fact that some “policies” can be self-defeating, and that this results in some theoretically interesting claims. Which is good and clevel, but for our purposes my point is that the “Argument from getting punched in the face by an AI that hates your policy in particular” does a good job of demonstrating this doesn’t prove anything about any given policy in particular.

Statistical Theodicy

[Contains spoilers for Unsong.]

Since time immemorial, people have asked how evil can exist in a world created by an omnipotent and benevolent God[1].

And since time immemorial, God has remained silent.

In his absence, Scott Alexander provides an elegant solution. God created every net-positive universe. He created the perfect one, the almost perfect ones, the less perfect ones… all the way down to our universe which is full of misery and suffering, but still, on balance, worthy of existence. As Scott explains on God’s behalf:



This is a good start, but it only kicks the can down the road. Why are we at the edge of the garden? Even if we buy that all universes except the single flawless one will contain some evil, it sure seems like our universe contains an awful lot of it. Is that just by chance?

The easy answer is selection bias, and the anthropic principle in particular[2]. Were we in the flawless universe, we would not bother asking about evil, since no such thing would exist[3]. So given that we’re asking at all, it’s because we’re in a universe with evil, and thus the question provides its own answer.

But again, we’re not talking about dust specks or platypuses or other minor oddities, we’re talking about the unfathomably abhorrent evils of our universe. Given that there is a wide range of universes with sufficient evil to provoke questions, it still seems peculiar that we ended up in one so far along the spectrum.

Maybe theodicy is possible in all flawed universes, but more common in the worse ones? I don’t think so. Were we to exist in a universe where precisely one person suffered immensely, would that not be even more troubling than our own? It is at least possible to dismiss the evils in our universe as the product of chaos. It would be far stranger to live in a world that is nearly perfect but still contains evil. As Dostoyevsky once asked:

answer me: imagine that you yourself are building the edifice of human destiny with the object of making people happy in the finale, of giving them peace and rest at last, but for that you must inevitably and unavoidably torture just one tiny creature, that same child who was beating her chest with her little fist, and raise your edifice on the foundation of her unrequited tears—would you agree to be the architect on such conditions?

A better answer comes to us from information theory and entropy. Namely: there are simply more disordered states than there are ordered ones.

Consider the perfect universe as described by a bit-string of length N. The universities with one flaw are thus described by the same bit-string, but with a single error. Omitting duplicates, there are N one-error universes:

Next we get to two-error universes, and the possibilities explode rapidly. Formally, there are n! / (2! (n-2)!) two-error universes, and in general, n! / (k! (n-k)!) k-error universes. Or defined recursively, the number or k-error universes is equal to the number of (k-1)-error universes multiplied by (n-k)/k. For large n and small k, this grows exponentially.

Taking this to its logical conclusion, we get that the space of possible universes looks less like a neat circular garden we happen to be towards the edge of, and more like a very very skewed distribution in which nearly all universes that exist are really flawed:

That’s all to say: As you add flaws to the perfect universe, the number of possible universes expands really quickly, such that if you are being randomly placed in a universe, the bulk of the probability lands on the set of maximally flawed universes that are still net-positive. And this fact is sufficient to explain the problem of evil without having to resort to weird appeals to free will or the necessity of evil.


Astute readers will notice that the binomial theorem does not expand forever. As k reaches n/2, the function begins to contract. Just as there is only one perfect universe, there is only one maximally flawed universe. And even before the function contracts, it begins to slow, violating the assumption that the vast majority of possible existences cluster right around the worst possible net-good universe.

There are two ways to avoid these inconvenient aspects of our model.

The first is simply to suggest that the net-good cutoff occurs prior to the reversal:

This is a reasonable assumption if you consider goodness to be fragile, and evil to be born from chaos. The maximally likely universe is the one with no structure of all, who’s configuration is purely random, and thus has no godly design. Hoping that it comes out net-good is like sending a tornado into a supermarket and hoping a decent meal comes out the other side.

Our second option is to claim that “introducing flaws” to a perfect universe is best modeled not as corrupting individual bits, but through some other process that grows strictly exponentially. Consider elsewhere in Unsong where Scott describes his own information theory-inspired theology:

God is one bit. The bit ‘1’… it’s easy to represent nothingness. That’s just the bit ‘0’. God is the opposite of that. Complete fullness. Perfection in every respect.

Rather than corrupting that single bit, flaws are introduced by appending new bits onto the end. It doesn’t even matter what they are, since anything other than God itself introduces an imperfection:

In this case, the number of possible universes simply increases exponentially as a function of the number of errors, again making it overwhelmingly likely that you are amongst the worst possible net-positive universes. Formally, there’s a ~50% chance we’re in the most flawed set of universes, a 25% chance we’re in the second most flawed set, and so on.

Addendum 2

Another possible answer is that we’re not far along the spectrum at all. Horrific as it may be to contemplate, maybe our universe is not bad at all, but merely average.

To be specific, not average amongst all universes that could exist, but merely amongst the ones that are good on balance. The implication is that you could double that amount of suffering in our world to get to a universe that is net-neutral: exactly as good as it is bad.

It is tempting to dismiss this outright. There is already so much evil, that doubling it would seem to obviously render the university net-negative. The holocaust as we experienced it was sufficient to make many lose faith altogether, the idea of a tragedy of double it’s magnitude existing in a merely net-neutral universe feels ludicrous[4]

Even ignoring the impossibility of summing up human welfare to figure out where we fall on the spectrum of net-positive universes, this entire line of argument seems to fail since the value of a universe is determined by its entire timeline including the future, not merely the history up until the current moment. So wherever your intuitions stand now about the balance of good and evil in our world, this is all just the prelude to a much longer history, and we can’t reasonably expect our experience thus far to be representative.

But wait, if human-history to date was net-negative, but our future will be glorious and good, couldn’t God just create the universe starting now and then implant memories, star dust, fossils, etc, to make it seem like the universe had gone on for much longer?

Come to think of it, what makes you convinced that he didn’t?


[1] Since one man’s modus ponens is another man’s modus tollens, we might also ask: how can God exist in a world that contains evil?

[2] You know something has gone wrong when anthropics is the easy answer.

[3] Really, if we were in the flawless universe, we would not be “beings” at all in the sense you and I understand the term, nor actually capable of asking questions. Scott again:


[4] Is it even more ludicrous for us to draw the line between 6 million and 12 million? First, this whole thing is predicated on us teetering on the edge of losing faith anyway. Second, everyone has their breaking point. Scott again:

He told me it didn’t work that way. Everyone’s willing to dismiss the evil they’ve already heard about. It’s become stale. It’s abstract. People who say they’ve engaged with the philosophical idea of evil encounter evil on their own, and then suddenly everything changes. He gave the example of all of the Jewish scholars who lost their faith during the Holocaust. How, they asked, could God allow six million of their countrymen to perish like that?

But read the Bible! Somebody counted up all the people God killed in the Bible, and they got 2.8 million. It wasn’t even for good reasons! He kills three thousand people for worshipping the Golden Calf. He kills two hundred fifty people for rebelling against Moses’ leadership. He kills fourteen thousand seven hundred people for complaining that He was killing too many people, I swear it’s in there, check Numbers 16:41! What right do we have to lose faith when we see the Holocaust? “Oh, sure, God killed 2.8 million people, that, makes perfect sense, but surely He would never let SIX million die, that would just be too awful to contemplate?” It’s like – what?

The lesson I learned is that everybody has their breaking point, the point where they stop being able to accept things for philosophical reasons and start kicking and screaming.


Corporate Culture is the Final Holdout of Mainstream America

At the Atlantic, Derek Thompson wants to push back against the anti-work narrative . Contrary to what you may have heard, he insists that Americans are in general, quite content to work.

In his view, Americans do want to work, they’re satisfied with the jobs they have, and the recent increase in quitting reflects not an increase in Marxist sentiment, but rather an increase in opportunity. People don’t quit because they hate capitalism, they quit to take better jobs.

Just looking at top level indicators, there’s good evidence for this. Since the start of the pandemic, quits are way up, but the Labor Force Participation Rate (LFPR) seems on track to recover:

Monthly Nonfarm Quit Rate. Source: FRED

Labor Force Participation Rate. Source: FRED

Those are the only two graphs you really need to make Thompson’s point. People are quitting their jobs, but they’re not quitting the workforce. So the Great Resignation is, as Thompson puts it, the “Great Job Switcheroo”.

…Except that Thompson doesn’t use those graphs. Instead he relies on a mishmash of poorly chosen and even more poorly interpreted metrics.

Let’s start with his argument that Americans are satisfied at work:

From 2018 to 2021—after an economic crisis, mass layoffs, and a surge in unemployment—the share of very or moderately satisfied workers fell from about 88 percent to … about 84 percent. These numbers aren’t outliers. They’re part of a boring tradition of American workers telling pollsters that they aren’t drowning in a sea of misery.

First, note that there is some selection bias happening here. The satisfaction numbers Thompson uses are only drawn from people who have full-time or part-time jobs. So if someone is so disgruntled that they leave the workforce entirely, that actually pushes job satisfaction numbers up.

Second, the 4% drop might not feel like an outlier, but it is a serious departure from historical norms. It’s lower than the metric has been since 1984, making it the second lowest satisfaction rate on record. And just looking at recent years, a fairly clear departure from the norm:

Data from the General Social Survey. Source for charts.

It’s worse than even that chart would suggest. Since we’re debating the existence of employees quitting out of burnout or resentment, we should we really be looking at the other end of the spectrum: the rate of respondents reporting “very dissatisfied”. Here, the rate was at 5%, over twice the 2018 rate of just 2.46%. That is a serious departure, and on a metric more important for explaining an increase in quit rate.

Similarly, this is the worst the metric has been since 1984 when we hit 5.5% “very dissatisfied”, and the second worst report on record since the survey started collecting data in 1973.

You might feel that even with big swings compared to historical averages, the absolute numbers just aren’t that big, and this still doesn’t feel like a huge shift towards worker resentment. Aggregated across the entire national population, a 4% drop is equivalent to 6.6 million people newly dissatisfied [1], constituting a fairly substantial cohort.

Next, Thompson turns to another leg of the Great Resignation narrative, attempting to debunk the idea that quits are driven by resentment. As he writes:

let’s address this pesky claim that the Great Resignation, or “quitagion,” or whatever is a reflection of job hatred and burnout. The Great Resignation isn’t a dramatic shift in worker sentiment. It’s a dramatic shift in worker opportunity.

…A greater share of people say they are contemplating quitting than express dissatisfaction with their current job," wrote Scott Schieman, a sociology professor at the University of Toronto who helped run the survey. Put simply, resignations are rising because people are seeing more job listings, not because they’re feeling more Marxist.

I’m all for relying on the credentials of established experts, so long as you actually represent their work correctly. But follow Thompson’s link, and you’ll find that Shieman explicitly denies this interpretation:

In 2018, about a quarter of respondents said finding another job would be very easy. I asked the same question in my 2021 survey and found that number had actually decreased to around 22%.

This means that worker confidence or optimism about finding a palatable alternative job has not climbed all that much, making it less likely to be a factor in driving the current wave of resignations.

Thompson is right that not all quits are driven by increased dissatisfaction, but that doesn’t mean a large share of them can’t be. And more to the point, it doesn’t actually provide evidence for his “increased opportunity” narrative, which is hard to square with the reality that fewer workers feel they could get another equally good job.

So what’s actually happening? I think The Great Resignation is less about the recent increase in quits and dissatisfaction, and more about the voicing of a long term trend. I mentioned at the beginning that the Labor Force Participation Rate was on track to recover to its pre-pandemic highs, but that’s only half the story. The other and much more important trend is the long term decline in LFPR that’s been going on for decades:

Starting around 1965, we see a steady increase as civil rights, progressive social norms and innovations like birth control enable the entry of more Americans into the professional workforce. There are pronounced effects on LFPR for women, black and hispanic Americans in particular.

But by 2000, we shift course. LFPR for women plateaus just under 60%, ending rapid “catch up” growth. Meanwhile, the entire time, another trend has been steadily pushing LFPR down. For Men, LFPR has been dropping as long as we’ve been measuring it, from a high of 87.4% in 1949, down to our current rate of just 68.3%.

Even for women, the rate has decreased modestly since its early 2000s peak, now down to 56.8% from a high of 60.3%.

Again, these are seemingly small changes that correspond to huge demographic shifts. Nearly 2 out of 10 men who would have been working in 1949 are now neither working, nor pursuing work. More speculatively, I’m willing to guess that these are the kinds of people who would have disproportionately showed up on the General Social Survey as “very dissatisfied” at work, meaning that the 5% figure we see today could be artificially lowered by selection effects. Basically, it’s not a good indication of the percent of people who actually dislike work.

I think what we’ve seen lately is a relatively small shift in quits and LFPR, accompanied by a massive cultural change in how acceptable it is to say you hate work. And not in a “water cooler conversation” kind of a way, but in a “I literally don’t want to have a job” kind of way.

This isn’t quite “Marxist sentiment” as Thompson describes it, but it’s an important shift in norms all the same. Ten years ago if you said you never wanted to work, people would think you were pathologically lazy. Now you can proudly say that you “don’t have a dream job because I don’t dream of labor”, and it’s not seen as a character flaw, but as the awareness that capitalism is exploitative. So exploitative in fact, that refusing to work is actually a kind of radical resistance, and any ensuing financial troubles actually a kind of noble martyrdom.

In some ways, this is just hippie rhetoric seeing a revival, but that doesn’t mean we should underestimate it. A common refrain is that a genuine counter-culture can’t exist anymore because there’s no longer a coherent mainstream culture to rebel against. This is true for television, and for news and for radio and everything else, but it’s not true for work.

There’s some variation, but not so much that we can’t all laugh at the same Dilbert jokes about bureaucracy, corporate jargon and office politicking. By and large, corporate cultures converge on the same optimally gray morass. It’s the last truly ubiquitous force in American culture.[2]

That means that anti-work is viable as a genuine subculture in a way that nothing has been for decades, and we ought to be prepared.[3]

[1] Civilian Noninstitutional Population of 263m, with a Labor Force Participation Rate of 62.4%, means a shift from 88% satisfaction to 84% satisfaction corresponds to 263m * 0.624 * 0.04 = 6.6 million people.

[2] There’s public school too, but only for children.

[3] It won’t all come in the form of resignation. It will look like playing video games while working from home, working multiple jobs in secret, starting side hustles, retiring early, contracting for Uber, becoming increasingly overeducated, working for DAOs, moving to areas with a lower cost of living, having more roommates, moving back in with your parents, living more frugally, and making your own coffee.