Statistical Theodicy

[Contains spoilers for Unsong.]

Since time immemorial, people have asked how evil can exist in a world created by an omnipotent and benevolent God[1].

And since time immemorial, God has remained silent.

In his absence, Scott Alexander provides an elegant solution. God created every net-positive universe. He created the perfect one, the almost perfect ones, the less perfect ones… all the way down to our universe which is full of misery and suffering, but still, on balance, worthy of existence. As Scott explains on God’s behalf:

I CREATED MYRIADS OF SUCH UNIVERSES. WHEN I HAD EXHAUSTED ALL POSSIBLE UNIVERSES WITH ONE FLAW, I MOVED ON TO UNIVERSES WITH TWO FLAWS, THEN UNIVERSES WITH THREE FLAWS, THEN SO ON, AN ENTIRE GARDEN OF FLAWED UNIVERSES GROWING ALONGSIDE ONE ANOTHER…

YOUR WORLD IS AT THE FARTHEST EDGES OF MY GARDEN… FAR FROM THE BRIGHT CENTER WHERE EVERYTHING IS PERFECT AND SIMPLE. THERE IS A WORLD MADE OF NOTHING BUT BLISS, WITH A GIANT ALEPH IN THE CENTER. THERE IS ANOTHER WORLD MADE OF NOTHING BUT BLISS WITH A GIANT BET IN THE CENTER. AND SO ON, BUT MAKE A MILLION MILLION WORLDS LIKE THOSE, AND YOU START NEEDING TO BECOME MORE CREATIVE.

This is a good start, but it only kicks the can down the road. Why are we at the edge of the garden? Even if we buy that all universes except the single flawless one will contain some evil, it sure seems like our universe contains an awful lot of it. Is that just by chance?


The easy answer is selection bias, and the anthropic principle in particular[2]. Were we in the flawless universe, we would not bother asking about evil, since no such thing would exist[3]. So given that we’re asking at all, it’s because we’re in a universe with evil, and thus the question provides its own answer.

But again, we’re not talking about dust specks or platypuses or other minor oddities, we’re talking about the unfathomably abhorrent evils of our universe. Given that there is a wide range of universes with sufficient evil to provoke questions, it still seems peculiar that we ended up in one so far along the spectrum.

Maybe theodicy is possible in all flawed universes, but more common in the worse ones? I don’t think so. Were we to exist in a universe where precisely one person suffered immensely, would that not be even more troubling than our own? It is at least possible to dismiss the evils in our universe as the product of chaos. It would be far stranger to live in a world that is nearly perfect but still contains evil. As Dostoyevsky once asked:

answer me: imagine that you yourself are building the edifice of human destiny with the object of making people happy in the finale, of giving them peace and rest at last, but for that you must inevitably and unavoidably torture just one tiny creature, that same child who was beating her chest with her little fist, and raise your edifice on the foundation of her unrequited tears—would you agree to be the architect on such conditions?


A better answer comes to us from information theory and entropy. Namely: there are simply more disordered states than there are ordered ones.

Consider the perfect universe as described by a bit-string of length N. The universities with one flaw are thus described by the same bit-string, but with a single error. Omitting duplicates, there are N one-error universes:

Next we get to two-error universes, and the possibilities explode rapidly. Formally, there are n! / (2! (n-2)!) two-error universes, and in general, n! / (k! (n-k)!) k-error universes. Or defined recursively, the number or k-error universes is equal to the number of (k-1)-error universes multiplied by (n-k)/k. For large n and small k, this grows exponentially.

Taking this to its logical conclusion, we get that the space of possible universes looks less like a neat circular garden we happen to be towards the edge of, and more like a very very skewed distribution in which nearly all universes that exist are really flawed:

That’s all to say: As you add flaws to the perfect universe, the number of possible universes expands really quickly, such that if you are being randomly placed in a universe, the bulk of the probability lands on the set of maximally flawed universes that are still net-positive. And this fact is sufficient to explain the problem of evil without having to resort to weird appeals to free will or the necessity of evil.

Addendum

Astute readers will notice that the binomial theorem does not expand forever. As k reaches n/2, the function begins to contract. Just as there is only one perfect universe, there is only one maximally flawed universe. And even before the function contracts, it begins to slow, violating the assumption that the vast majority of possible existences cluster right around the worst possible net-good universe.

There are two ways to avoid these inconvenient aspects of our model.

The first is simply to suggest that the net-good cutoff occurs prior to the reversal:

This is a reasonable assumption if you consider goodness to be fragile, and evil to be born from chaos. The maximally likely universe is the one with no structure of all, who’s configuration is purely random, and thus has no godly design. Hoping that it comes out net-good is like sending a tornado into a supermarket and hoping a decent meal comes out the other side.

Our second option is to claim that “introducing flaws” to a perfect universe is best modeled not as corrupting individual bits, but through some other process that grows strictly exponentially. Consider elsewhere in Unsong where Scott describes his own information theory-inspired theology:

God is one bit. The bit ‘1’… it’s easy to represent nothingness. That’s just the bit ‘0’. God is the opposite of that. Complete fullness. Perfection in every respect.

Rather than corrupting that single bit, flaws are introduced by appending new bits onto the end. It doesn’t even matter what they are, since anything other than God itself introduces an imperfection:

In this case, the number of possible universes simply increases exponentially as a function of the number of errors, again making it overwhelmingly likely that you are amongst the worst possible net-positive universes. Formally, there’s a ~50% chance we’re in the most flawed set of universes, a 25% chance we’re in the second most flawed set, and so on.

Addendum 2

Another possible answer is that we’re not far along the spectrum at all. Horrific as it may be to contemplate, maybe our universe is not bad at all, but merely average.

To be specific, not average amongst all universes that could exist, but merely amongst the ones that are good on balance. The implication is that you could double that amount of suffering in our world to get to a universe that is net-neutral: exactly as good as it is bad.

It is tempting to dismiss this outright. There is already so much evil, that doubling it would seem to obviously render the university net-negative. The holocaust as we experienced it was sufficient to make many lose faith altogether, the idea of a tragedy of double it’s magnitude existing in a merely net-neutral universe feels ludicrous[4]

Even ignoring the impossibility of summing up human welfare to figure out where we fall on the spectrum of net-positive universes, this entire line of argument seems to fail since the value of a universe is determined by its entire timeline including the future, not merely the history up until the current moment. So wherever your intuitions stand now about the balance of good and evil in our world, this is all just the prelude to a much longer history, and we can’t reasonably expect our experience thus far to be representative.

But wait, if human-history to date was net-negative, but our future will be glorious and good, couldn’t God just create the universe starting now and then implant memories, star dust, fossils, etc, to make it seem like the universe had gone on for much longer?

Come to think of it, what makes you convinced that he didn’t?

Footnotes

[1] Since one man’s modus ponens is another man’s modus tollens, we might also ask: how can God exist in a world that contains evil?

[2] You know something has gone wrong when anthropics is the easy answer.

[3] Really, if we were in the flawless universe, we would not be “beings” at all in the sense you and I understand the term, nor actually capable of asking questions. Scott again:

IN THAT UNIVERSE, THERE IS NO SPACE, FOR SPACE TAKES THE FORM OF SEPARATION FROM THINGS YOU DESIRE. THERE IS NO TIME, FOR TIME MEANS CHANGE AND DECAY, YET THERE MUST BE NO CHANGE FROM ITS MAXIMALLY BLISSFUL STATE. THE BEINGS WHO INHABIT THIS UNIVERSE ARE WITHOUT BODIES, AND DO NOT HUNGER OR THIRST OR LABOR OR LUST. THEY SIT UPON GOLDEN THRONES AND CONTEMPLATE THE PERFECTION OF ALL THINGS.

[4] Is it even more ludicrous for us to draw the line between 6 million and 12 million? First, this whole thing is predicated on us teetering on the edge of losing faith anyway. Second, everyone has their breaking point. Scott again:

He told me it didn’t work that way. Everyone’s willing to dismiss the evil they’ve already heard about. It’s become stale. It’s abstract. People who say they’ve engaged with the philosophical idea of evil encounter evil on their own, and then suddenly everything changes. He gave the example of all of the Jewish scholars who lost their faith during the Holocaust. How, they asked, could God allow six million of their countrymen to perish like that?

But read the Bible! Somebody counted up all the people God killed in the Bible, and they got 2.8 million. It wasn’t even for good reasons! He kills three thousand people for worshipping the Golden Calf. He kills two hundred fifty people for rebelling against Moses’ leadership. He kills fourteen thousand seven hundred people for complaining that He was killing too many people, I swear it’s in there, check Numbers 16:41! What right do we have to lose faith when we see the Holocaust? “Oh, sure, God killed 2.8 million people, that, makes perfect sense, but surely He would never let SIX million die, that would just be too awful to contemplate?” It’s like – what?

The lesson I learned is that everybody has their breaking point, the point where they stop being able to accept things for philosophical reasons and start kicking and screaming.

Coda

Corporate Culture is the Final Holdout of Mainstream America

At the Atlantic, Derek Thompson wants to push back against the anti-work narrative . Contrary to what you may have heard, he insists that Americans are in general, quite content to work.

In his view, Americans do want to work, they’re satisfied with the jobs they have, and the recent increase in quitting reflects not an increase in Marxist sentiment, but rather an increase in opportunity. People don’t quit because they hate capitalism, they quit to take better jobs.

Just looking at top level indicators, there’s good evidence for this. Since the start of the pandemic, quits are way up, but the Labor Force Participation Rate (LFPR) seems on track to recover:

Monthly Nonfarm Quit Rate. Source: FRED

Labor Force Participation Rate. Source: FRED

Those are the only two graphs you really need to make Thompson’s point. People are quitting their jobs, but they’re not quitting the workforce. So the Great Resignation is, as Thompson puts it, the “Great Job Switcheroo”.

…Except that Thompson doesn’t use those graphs. Instead he relies on a mishmash of poorly chosen and even more poorly interpreted metrics.

Let’s start with his argument that Americans are satisfied at work:

From 2018 to 2021—after an economic crisis, mass layoffs, and a surge in unemployment—the share of very or moderately satisfied workers fell from about 88 percent to … about 84 percent. These numbers aren’t outliers. They’re part of a boring tradition of American workers telling pollsters that they aren’t drowning in a sea of misery.

First, note that there is some selection bias happening here. The satisfaction numbers Thompson uses are only drawn from people who have full-time or part-time jobs. So if someone is so disgruntled that they leave the workforce entirely, that actually pushes job satisfaction numbers up.

Second, the 4% drop might not feel like an outlier, but it is a serious departure from historical norms. It’s lower than the metric has been since 1984, making it the second lowest satisfaction rate on record. And just looking at recent years, a fairly clear departure from the norm:

Data from the General Social Survey. Source for charts.

It’s worse than even that chart would suggest. Since we’re debating the existence of employees quitting out of burnout or resentment, we should we really be looking at the other end of the spectrum: the rate of respondents reporting “very dissatisfied”. Here, the rate was at 5%, over twice the 2018 rate of just 2.46%. That is a serious departure, and on a metric more important for explaining an increase in quit rate.

Similarly, this is the worst the metric has been since 1984 when we hit 5.5% “very dissatisfied”, and the second worst report on record since the survey started collecting data in 1973.

You might feel that even with big swings compared to historical averages, the absolute numbers just aren’t that big, and this still doesn’t feel like a huge shift towards worker resentment. Aggregated across the entire national population, a 4% drop is equivalent to 6.6 million people newly dissatisfied [1], constituting a fairly substantial cohort.

Next, Thompson turns to another leg of the Great Resignation narrative, attempting to debunk the idea that quits are driven by resentment. As he writes:

let’s address this pesky claim that the Great Resignation, or “quitagion,” or whatever is a reflection of job hatred and burnout. The Great Resignation isn’t a dramatic shift in worker sentiment. It’s a dramatic shift in worker opportunity.

…A greater share of people say they are contemplating quitting than express dissatisfaction with their current job," wrote Scott Schieman, a sociology professor at the University of Toronto who helped run the survey. Put simply, resignations are rising because people are seeing more job listings, not because they’re feeling more Marxist.

I’m all for relying on the credentials of established experts, so long as you actually represent their work correctly. But follow Thompson’s link, and you’ll find that Shieman explicitly denies this interpretation:

In 2018, about a quarter of respondents said finding another job would be very easy. I asked the same question in my 2021 survey and found that number had actually decreased to around 22%.

This means that worker confidence or optimism about finding a palatable alternative job has not climbed all that much, making it less likely to be a factor in driving the current wave of resignations.

Thompson is right that not all quits are driven by increased dissatisfaction, but that doesn’t mean a large share of them can’t be. And more to the point, it doesn’t actually provide evidence for his “increased opportunity” narrative, which is hard to square with the reality that fewer workers feel they could get another equally good job.

So what’s actually happening? I think The Great Resignation is less about the recent increase in quits and dissatisfaction, and more about the voicing of a long term trend. I mentioned at the beginning that the Labor Force Participation Rate was on track to recover to its pre-pandemic highs, but that’s only half the story. The other and much more important trend is the long term decline in LFPR that’s been going on for decades:

Starting around 1965, we see a steady increase as civil rights, progressive social norms and innovations like birth control enable the entry of more Americans into the professional workforce. There are pronounced effects on LFPR for women, black and hispanic Americans in particular.

But by 2000, we shift course. LFPR for women plateaus just under 60%, ending rapid “catch up” growth. Meanwhile, the entire time, another trend has been steadily pushing LFPR down. For Men, LFPR has been dropping as long as we’ve been measuring it, from a high of 87.4% in 1949, down to our current rate of just 68.3%.

Even for women, the rate has decreased modestly since its early 2000s peak, now down to 56.8% from a high of 60.3%.

Again, these are seemingly small changes that correspond to huge demographic shifts. Nearly 2 out of 10 men who would have been working in 1949 are now neither working, nor pursuing work. More speculatively, I’m willing to guess that these are the kinds of people who would have disproportionately showed up on the General Social Survey as “very dissatisfied” at work, meaning that the 5% figure we see today could be artificially lowered by selection effects. Basically, it’s not a good indication of the percent of people who actually dislike work.

I think what we’ve seen lately is a relatively small shift in quits and LFPR, accompanied by a massive cultural change in how acceptable it is to say you hate work. And not in a “water cooler conversation” kind of a way, but in a “I literally don’t want to have a job” kind of way.

This isn’t quite “Marxist sentiment” as Thompson describes it, but it’s an important shift in norms all the same. Ten years ago if you said you never wanted to work, people would think you were pathologically lazy. Now you can proudly say that you “don’t have a dream job because I don’t dream of labor”, and it’s not seen as a character flaw, but as the awareness that capitalism is exploitative. So exploitative in fact, that refusing to work is actually a kind of radical resistance, and any ensuing financial troubles actually a kind of noble martyrdom.

In some ways, this is just hippie rhetoric seeing a revival, but that doesn’t mean we should underestimate it. A common refrain is that a genuine counter-culture can’t exist anymore because there’s no longer a coherent mainstream culture to rebel against. This is true for television, and for news and for radio and everything else, but it’s not true for work.

There’s some variation, but not so much that we can’t all laugh at the same Dilbert jokes about bureaucracy, corporate jargon and office politicking. By and large, corporate cultures converge on the same optimally gray morass. It’s the last truly ubiquitous force in American culture.[2]

That means that anti-work is viable as a genuine subculture in a way that nothing has been for decades, and we ought to be prepared.[3]

Footnotes
[1] Civilian Noninstitutional Population of 263m, with a Labor Force Participation Rate of 62.4%, means a shift from 88% satisfaction to 84% satisfaction corresponds to 263m * 0.624 * 0.04 = 6.6 million people.

[2] There’s public school too, but only for children.

[3] It won’t all come in the form of resignation. It will look like playing video games while working from home, working multiple jobs in secret, starting side hustles, retiring early, contracting for Uber, becoming increasingly overeducated, working for DAOs, moving to areas with a lower cost of living, having more roommates, moving back in with your parents, living more frugally, and making your own coffee.

San Francisco Shoplifting: Much More Than You Wanted to Know

Some people argue that shoplifters in San Francisco are running rampant. I don’t think this is true.

Instead of a proper introduction, let’s just refute with the strongest arguments off the bat.

Argument 1: “Forget the empirics for a second, and just look at the actual policies. California’s Prop 47 decriminalises shoplifting and means that there are no consequences for criminals. That’s clearly bad right?”

In a Marginal Revolution post about San Francisco’s alleged shoplifting spree, economist Tyler Cowen highlights a popular explanation for criminals’ flagrant disregard of the law. As his linked New York Times article explains:

Five years later, the shoplifting epidemic in San Francisco has only worsened…

The retail executives and police officers emphasized the role of organized crime in the thefts. And they told the supervisors that Proposition 47, the 2014 ballot measure that reclassified nonviolent thefts as misdemeanors if the stolen goods are worth less than $950, had emboldened thieves.

As Tyler concludes: “yes incentives matter”.

Similarly, in an article tilted 'San Francisco Has Become a Shoplifter’s Paradise" the Wall Street Journal argues:

Much of this lawlessness can be linked to Proposition 47, a California ballot initiative passed in 2014, under which theft of less than $950 in goods is treated as a nonviolent misdemeanor and rarely prosecuted.

That sounds really bad right? It would be horrible if San Francisco were experiencing an entirely avoidable crime spree due entirely to some ridiculously misguided progressive policy.

…Except that this causal attribution is not even plausibly correct.

Though it’s tempting to frame the law as just another artefact of California’s bleeding-heart progressivism, similar laws are in fact present in all 50 states.

You might object that other states have different financial thresholds for felonies… which would be a good counter argument except that California is actually on the low end. Every single US state has a minimum threshold for felony theft, and 38 of them are higher than California’s. Texas notably has a threshold of $2500, which means you can steal 2.6 more than you could in California before being classified as a felon. Talk about incentives! [1]

Or from Pew, here’s a useful map of the thresholds across the US, highlighting California’s unusually stringent standards:

So not only is the law not a San Francisco Matter, or a Chesa Boudin matter, it’s not even a California matter! And even if it were, a study from Pew research suggests it wouldn’t have a significant impact on larceny rates anyway.

Further, as the sources correctly note, the policy was implemented in California in 2014. So even on the basis of their own limited facts, it doesn’t make any sense to blame recent alleged waves of shoplifting on either the new DA or a law that’s been in place for several preceding years.

Look: this is a complex issue, and some of the following claims will be complex and nuanced.

But this first one is easy. It’s not even arguable. It’s just poor causal attribution, poor reasoning, and an abject failure to do even the bare minimum of background research before reporting in prestige media outlets.

Argument 2: “San Francisco has a shoplifting surge, but it’s not reported because people know the police won’t respond, and the new DA won’t press charges.”

This is a compelling view, and one that puts us into a seductively militant position. What’s the point of arguing about data if we have good reason not to trust it?

Fortunately, there is still an avenue to find good evidence: every neighbouring county.

Imagine you’re an organised criminal in the Bay Area, but not living in San Francisco proper. Perhaps you’re across the Golden Gate Bridge in Sausalito, or across the Bay Bridge in Oakland. One day, San Francisco elects a new District Attorney who’s famously soft on crime and refuses to prosecute shoplifters. Do you:

  • A) Continue to shoplift in Oakland where you might be caught and punished, or
  • B) Take a 15 minute BART ride to San Francisco where you can shoplift with impunity?

In other words, if San Francisco is truly the “shoplifter’s paradise” it’s critics claim, we shouldn’t just expect to see a rise in cases locally, we should expect a drop in cases in all of its neighbours. Shoplifters should be virtually swarming from across county lines to partake in the unencumbered criminality.

Instead, here’s how shoplifting changed in adjacent counties from 2019 to 2020:

[Decline in shoplifting reports in San Francisco, adjacent counties and California as a whole from 2019 to 2020. Data from the California Department of Justice, compiled in this Google Sheet.]

It’s tricky, cases do fall, but most of that is pandemic effects. When compared to the state as a whole, adjacent counties saw relatively little decrease in shoplifting (with the slim exception of San Mateo). That’s the exact opposite of what you would expect to see if criminals were following their alleged incentives.

Faced with the San Francisco data alone, you might argue that residents have given up on the DA, so reports are down even as crime is up. But it’s hard to tell a similar story about adjacent counties’ apparent lack of reported decline.

(You might argue that this only proves the problem runs deeper than DA Chesa Boudin, and deeper than San Francisco, and is really about California as a whole being too soft and liberal. But again, think about what the data tells us. It’s not just that Marin County has more or less shoplifting than the state as a whole, it’s that upon a specific intervention (the election of a new DA), shoplifting in Marin County dropped less, indicating a lack of substitution into San Francisco.)

The problem with the “residents don’t bother reporting” view is that it’s simply too compelling. It allows us to ignore data, ignore nuanced argumentation, ignore even the semblance of rigour, all while taking the moral high ground of a “man on the street” aligned with the interests of “everyday people”.

As concerned resident Sam Altman writes:

“The thing I have noticed is when the anecdotes and the data disagree, the anecdotes are usually right. There’s something wrong with the way you are measuring it” -Jeff Bezos

This is how I feel about SF crime data. Almost everyone I know who lives here is experiencing more.

As tempting as it is, we can’t fall prey to this kind of quantitative-nihilism. The problem with anecdotes isn’t just that they’re biased, subject to selection effects, and so on, it’s that everyone has their own experiences, so we can go on disagreeing with each other, sometimes violently, until the end of time.

In contrast, if you don’t like my data, it’s here for you to independently investigate, criticise, reproduce, etc. It’s not objective, and it’s not a perfect science, but we can at least have a conversation rooted in some common understanding of what evidence is out there, and we can make shared epistemic progress.

So the lesson here is that San Francisco’s doesn’t seem to be much softer on crime than the rest of California, but the meta-lesson is that when one data source is known to be poor, you don’t have to give up, you just have to find other angles of attack.

Argument 3: “We know San Francisco has a shoplifting problem because stores are closing. That can’t be faked and wouldn’t happen otherwise, so it’s strong evidence that the city has a real problem.”

Some staunch anti-capitalists might say “Walgreens is closing? Good. Screw corporate profits.” [2]

Let me clarify that this is not my view. As an armchair urbanist like the rest of you, I think that dense urban walkable neighbourhoods are critical to thriving communities, and though a small local business might be preferable to a chain pharmacy, the latter is still strongly preferable to nothing. When stores close, communities lose access to groceries, over the counter medicine, prescription pickup, Covid vaccinations and much more.

So if store closures are undeniable and tragic, why do I disagree? Because you’re still ignoring base rates.

Yes, Walgreens announced the closure of 5 stores throughout the city. I’ve heard this cited over and over again. What I haven’t seen is discussion of how that compares to their historical average closure rate, the average closure rate of comparable stores in comparable cities, and so on. What we’re asking is not “did Walgreens close stores”, but “are these closures unusual enough to serve as evidence of a broader societal ill?”

According to the SF Chronicle, there were 17 Walgreens closures in the 5 years leading up to May 2021, and from the Wall Street Journal, another 5 announced in the latter half of 2021, bringing the total to 22.

So ignoring the most recent year, Walgreens was closing at an annual rate of 3.4 stores. Including 2021, the overall average was 3.7 per year. That makes 5 closures in a single year high, but not extraordinarily so. [3]

We can also compare SF Walgreens to NYC Duane Reade, a similar outlet owned by the same parent company. Industry data shows that their number of locations plummeted from 317 in 2019, to just 253 in 2020, for a decline of 17%. By comparison, there are still 45 Walgreens left in San Francisco, making their 2021 decline a much more modest 10%.

Third, we’ll look at how the 2021 Walgreens closures compare to their broader financial outlook. Their stock price is down 39% since 2016, compared to a store closure rate in SF of 33% in the same period. I’m not suggesting store footprints should necessarily be linearly related to stock prices, just that given the existence of a general decline in the chain’s finances, store closures don’t really merit additional explanation or blame.

For an overall comparison, hear are some rates to consider:

The 5 store closures are certainly notable, but taken in context, don’t suggest an extreme deviation from what we might expect.

What I’m really saying is: store closures are bad, but they are sufficiently well explained by existing non-shoplifting factors, including Walgreens’s general decline, and of course, the global pandemic and subsequent economic shocks that hit in this same time period.

Rather than naively inferring from stock price, it’s worth taking a closer look at the chain’s financial ills. On a January 6th call with investors, the Walgreens CFO detailed how the chain’s shrink rate–loss of inventory due to shoplifting, employee theft, and other causes–has hit 3.25%, up from 2% a decade ago. The phenomenon hasn’t been unique to San Francisco. As shown in data collected from the National Retail Security Survey and charted by SF Chronicle, shrinkage rates are up across the country:

We can further investigate the claim by looking at individual Walgreens. Though some of the five stores with announced closures did see a substantial rise in shoplifting in 2020, SF Chronicle reports that the 2550 Ocean Ave. location reported a mere 3 cases throughout the entire year, similar to the 4 cases it reported in 2019. The 4645 Mission St. location reported a mere 3 cases of shoplifting in 2021 (the closure was announced in mid-October).

Further doubts are cast on the shoplifting-driven closure narrative due to a 2019 SEC Filing in which Walgreens announced their plan to close “approximately 200 locations in the United States” as part of their “Transformational Cost Management Program” designed to “achieve increased cost efficiencies”.

On Twitter, Mike Solana sarcastically quips that the profit-driven motive for closing stores is absurd since “everyone knows the fewer stores you have, the more money you make. ‘going out of business’ is just another nefarious capitalist plot.”

But in a world transformed by ecommerce, pandemics and labour shortages… that is actually true. It’s not exactly conspiratorial to think that closing unprofitable stores is a good way to raise profits.

That’s precisely what they explained in their 2019 SEC filing, and even how, up until recently, Walgreens explained specific chain closures. From a February 2020 article, a spokesperson explains that the 730 Market St. store closure was “necessary to cut costs”, and part of the “transformational cost management program to accelerate the ongoing transformation of our business, enable investments in key areas and to become a more efficient enterprise.”

Finally, we can zoom in to the 2019-2020 to examine the effects of a new DA and look at the 8 stores closed over that time period. Even addresses in hand, it’s hard to find news coverage of specific closures, but I’ve done my best to source dates for each of these locations:

So of the 8 stores closed over the 2 year period, 4 were in 2019, 2 in 2020, and 2 are ambiguous, but have their final Yelp review in 2019. Since these reviews are intermittent, we can’t rely on them entirely, but it’s at least suggestive.

Overall, it doesn’t look like Walgreens’ 2021 closure rate requires explanation beyond pandemic effects, their own 2019 cost management program, and broader economic trends. Additionally, it appears their 2020 closure rate was below, or at best on par with, their 2019 closure rate, indicating that the new DA, who took office at the start of 2020, did not accelerate the decline of Walgreens in San Francisco.

Again: I believe that store closures are generally bad. And whether or not it’s a departure from historical and national context, you may feel that another 5 Walgreens closing in 2021 is bad for the city. I agree, but that’s not the question at hand. What we need to understand is whether or not these closures can be blamed on the new DA, “soft-on-crime” policies, or progressive criminal justice reform. And given the evidence presented, the case for that narrative is extremely weak.

Argument 4: But I’ve seen the viral videos!

Look, I’ll admit that these look bad, and that simply appealing to higher faculties is insufficient for dismissing clear, flagrant and frankly shocking instances of shoplifting in a city that just doesn’t seem to care.

You may have seen this video of two women loading cartfulls of what appears to be laundry detergent and other bath products into the back of a van:

Or this one of two men with what can only be described as humorously large bags walking (not even bothering to run) out of a store:

Or this one of a smash and grab inside a mall:

If you haven’t guessed, none of these are from San Francisco. The first is in Connecticut, the second in Granada Hills in LA, and the third allegedly from Desert Sky Mall in Phoenix, though I haven’t found a real source. Of course there are videos taken of shoplifters in San Francisco, my point is just that they don’t prove anything.

I know this is a stupid form of argumentation, but some of you literally asked for it, and I’m sick of spending dozens of hours on data analysis only to get feedback that it can’t possibly be right because it contradicts some anecdote you heard from a friend.

Let me be clear about how this kind of observational data works. If I say “there’s no data proving that sparking unicorns”, and you say “I’ve literally seen one”, then you are right to trust your observations and distrust my data. All the claim calls for is an existence proof, and as far as your experiences are concerned, you have a perfectly valid one.

On the other hand, if I say “There were thousands of shoplifting cases in San Francisco in 2020 and 2021, I’m just not convinced this represents a sudden increase over previous years”. And you say “But I’ve seen people shoplifting!!!”, that proves approximately nothing. You could show me hundreds of videos, and I still wouldn’t care, because you could have done the same thing in 2019, and you could have done the same in dozens of other cities. So sure, it’s evidence for something, just not the question that’s actually relevant to this debate.

The past couple years have been difficult. It’s easy to be a doomer, and easy to feel hyper-attuned to things that are going poorly.

Writing for the New York Times, bureau chief Thomas Fuller even admits that he bore witness to flagrant shoplifting in the city as early as 2016. But at the time it felt like a one off thing, and in 2022 it now feels like part of some  bigger narrative. You feel that the DA is too progressive. That the BLM protests were too destructive. That the city is too woke. Whatever. None of these macro-narratives good reason to believe that your anecdotes are representative of broader trends, if anything, they’re reasons to think you’re suffering from confirmation bias.

You used to live in small tribes and developed a brain eager to generalise from small sample sizes. Now you live in a world of 8 billion people, most of them equipped with smart phones and an internet connection. It’s easy to find anecdotal evidence, and easy to feel like it’s compelling. As Gwern once wrote:

The paradox of news is that by design, the more you read, the less you might know, by accumulating an ever greater arsenal of facts and examples which are (usually) true, but whose interpretation bears ever less resemblance to reality.

That’s the reality of our brave new world. I suggest you either develop defences against it, or get off the internet. Better yet, stop reading the news altogether.

If this was a frustrating section to read, I promise it was equally frustrating to write, and I take no joy in these childish bait-and-switch ploys. But don’t give up yet, the next two are better in tone, importance and rigour.

Argument 5: “Chesa Boudin really is overly progressive. He has naive views about criminal justice, and charging rates have dropped as a result. I don’t have to prove that as consequences fall, criminality goes up.”

From the SF Chronicle analysis, here’s the breakdown of Chesa’s charging rates:

They have indeed dropped for theft and petty theft, and by fairly substantial levels! Petty theft in particular is down from 58% to 35%. So is that it? Despite everything, Chesa really is soft on crime, and whatever the other evidence suggests, it would be strange for this change to not result in a dramatic surge?

Not so fast. Again, we have to understand the results in context. Here are charging rates for Chesa and the previous DA over the last decade, both overall and for larceny/theft:

There was indeed a sharp drop in 2020, but it was followed by a sharp uptick. For Larceny/Theft, the 2021 rate is still a bit under peak (74% versus 77%), but much higher than the historical average of 61%. His overall 2021 charging rates are even more aggressive, coming in just under the 2018 peak (67% versus 71%), and much higher than the average of 60%.

In both cases, Chesa’s 2021 rates are higher than those in 7 out of the 9 years preceding his tenure.

So if Chesa isn’t particularly light on crime, why the drop in 2020? Chesa’s own defence is that logistics were difficult, and he had to prioritise. From the same SF Chronicle piece:

Boudin maintains that the drop in charging rates for theft is mainly due to the reduced operation of San Francisco’s court system caused by COVID-19 restrictions. The charging rate for both types of theft increased between 2020 and 2021 as the city reopened, the data shows.

“We had clear instructions from courts to delay and defer anything we could delay and defer, [and] from the medical director to drastically reduce the (jail) population,” he said. “In the context of those really difficult decisions, we did make intentional decisions to delay or defer charging low-level nonviolent cases.”[4]

Granting his defence, we’re not left asking if Chesa’s charging rates are high enough across the board, but if, given some limited capacity to process cases, he was allocating resources in a reasonable way. Here were his charging rates across the board:

Here we see that although rates were down for several crimes, they weren’t down across the board, and were higher for rape, willful homicide and narcotics cases. The first two are obviously serious crimes worthy of justice system resources, and though drug policy is controversial, this third category seems fair given San Francisco’s tragic record of drug overdoses.  Overall, it feels like a reasonable allocation of limited resources.

Finally, you might wonder about the specific instances you’ve heard of where Chesa lets off a criminal, only for them to re-offend, sometimes with horrific consequences. This is obviously bad, but it’s not so much a matter of intense debate as a matter of statistical nuance meeting a human interest story.

If an innocent person is arrested, we never get to learn about the counterfactual: what could have happened otherwise. On the other hand, if a guilty person is let free and commits another crime, we’re given reason to feel regret and resent the DA.

I’m not going to wax poetic about liberalism’s virtue of requiring high burdens of proof for criminal charges. The point is just that given imperfections, a statistical classifier will make two kinds of errors, and given the human context, we’re liable to ignore the false positives are ignored, while the false negatives are seared into our collective consciousness.

Argument 6: “Reported cases are actually skyrocketing.”

It might feel bizarre to put this last, but I’m worried it’s a strawman since very few of Chesa’s critics actually seem to be citing the data. In any case, if we just take monthly reports, a disturbing trend does start to emerge:

Reports were pretty stable around 200, plummeted at the start of the pandemic, slowly rebounded, and then suddenly shot up from 193 in August to 391 in September.

That sounds bad, until you take a closer look at where the reports are coming from. Or more specifically, the one store where nearly half of all reports originated:

source: sfgov.org

That’s right, of the 391 reports from September, 155 were from a single location: a Target on Mission and 4th. So what happened? A dramatic crime surge? Not quite, as a spokesperson explains: “The store was simply using a new reporting system.”

What about October and November? Again, it’s mostly just that one Target location. In October, in a city of nearly a million people, one Target accounted for 39% of all shoplifting reports. In November, it accounted for 35%. To put that into context: In those three months, that Target reported a total of 465 cases, compared to 201 cases it had reported in the previous 44 months combined.

To correct against the apparent increased caused by their new reporting system, we can replace their reports from those months with the store’s historical average to get a sense for what reports might have looked like:

This is starting to look less like a sudden surge and more like a mundane reversion to the mean. But you’ll notice there’s still a sudden surge in November.

At this point, I was suspicious enough of outliers to investigate further. And sure enough, found a single Safeway that accounts for the entire spike:

_source: [sfgov.org](https://data.sfgov.org/d/wg3w-h783/visualization#)_

I wasn’t able to find a similar source confirming that this is due to a change in reporting, but when asked about increased reports at the Target location, a police spokesperson confirmed that “the new reporting system was accessible to other businesses”, so it wouldn’t be too surprising if this Safeway adopted the new system in November.

Previously, this location had reported just 1 incident per month, compared to 120 incidents in November. Making a similar adjustment, we’re left with a totally unremarkable trend:

With those two adjustments made, the spike disappears entirely, and we’re left with something totally expected: a sudden drop at the start of the pandemic, followed by a steady reversion to the historical average.

Now look, it’s entirely possible that the new reporting system Target uses is better and more accurate than the previous one. In fact, I think it’s likely that they were undercounting before, and that the new data is closer to ground truth. But an odd property of data analysis is that being selectively more accurate, or more accurate all of a sudden, does not necessarily make your overall picture a better one.

The crucial question at hand is not whether Target suffered 100+ shoplifting cases in a single month, but whether it was an abrupt departure from historical trends. I don’t think the data supports that narrative, but it doesn’t dispel it either. That’s why it’s important to leverage the variety of angles I’ve pursued throughout this piece.

Conclusion

This has been a contentious post, so let me wrap things up with more mundane “good-things-are-good” optimism. Whatever you think of the last couple years, here is the actual macro trend that matters:

And across non-violent crimes more broadly:

And finally, violent crimes:

You’re free to have doubts about the veracity of a lot of crime-data, but homicide rates are as close as we get to ground truth. They’re hard to cover up, the police do investigate, and people do bother reporting them. And as we saw, even in 2020, Chesa Boudin’s charging rate for willful homicides was actually above that of his predecessor.

In a variety of ways, across a variety of important indicators, crime per capita has plummeted in San Francisco over the last 30 years.

San Francisco remains a focal point for a number of crises, not because it’s unique, but because it’s a uniquely good entrypoint into a host of popular issues. It’s a city that gives conservatives reason to shriek in horror and point out that the progressive-state has failed, while giving liberals an opportunity to bemoan income inequality, housing shortages, gentrification and so on.

It is tempting to turn the city into a kind of ideological battleground. And in some ways, it should be. If San Francisco fails, it might take many of our brightest technical minds with it.

But San Francisco is also a real city, inhabited by real people. I mean that in a sentimental way, but also in the sense of being in awe at the complexity of the world. Some police data is bad, that’s fine, you can get around it. Don’t trust the SFPD statistics? You can check them against the California DOJ or FBI numbers. Skeptical about a chart? You can use the sfgov data portal to investigate each individual data point.

Scott states the law of rationalist irony: “the smugger you feel about having caught a bias in someone else, the more likely you are falling victim to that bias right now, in whatever way would be most embarrassing.”

And I’m sure it applies here. I’m not a criminologist, or a data journalist, or even a San Francisco resident. I don’t know if I would have voted for Chesa in 2019, or if I would have voted against him in the upcoming recall.

But I am actually interested in figuring out what’s true, and discussing it beyond shallow politics and useless anecdotes.

If I’ve made a mistake here, I would love for you to email me and tell me about it so I can share the correction with others. If there’s data I’m missing or could have used more effectively, I would love to hear about that too. And finally, if you have some compelling macro-narrative, speculative or not, that explains all the downright weirdness in all of this, I am happy to discuss it.


Thanks to Slime Mold Time Mold for comments, and thanks to Maxwell Tabarrok for research assistance with this post.


Footnotes
[1] What’s more, only Alaska’s threshold is tied to inflation, and some of these laws haven’t been updated since 1978 (New Hampshire) So there’s a strong argument to be made that the thresholds for classifying theft as felony should be higher in most US states. That same Pew report cites one of their studies, explaining that “in the 30 states that raised their thresholds between 2000 and 2012, downward trends in property crime or larceny rates, which began in the early 1990s, continued without interruption. These states reported roughly the same average decrease in crime as the 20 states that did not change their theft laws, and the threshold amounts were not correlated with property crime or larceny rates.”

[2] Though I will admit that it’s ironic to see so much attention placed on shoplifting under $950 in merchandise, and so little placed on Walgreens admitting to millions in wage theft in California alone.

[3] That might sound weird, but statistically it’s the norm. If you’re trying to detect outliers, you still include outliers in the standard deviation calculation.

[4] I’m a little hesitant to just take Chesa’s word at face value, and wasn’t able to independently confirm, but adversarial source Delian Asparouhov does admit after talking to Chesa that it’s “interesting to think about wtf do you do when you can’t run a jury trial…”.