### Statistical mechanics of the Roberts court

The Supreme Court of the US (SCOTUS) is a very important policymaking body. It used to be that the Court maintained a veneer of the impartial body that was resolving legal disputes through the words of the US Constitution ("honest jurisprudence" according to Scalia), but it is, I think, increasingly seen today as a body that establishes policies in its own right. It would be hard to trace this back to a single momentous event, but one certainly important case was Bush v. Gore in 2000, when SCOTUS decided the next US president, and might I mention voted strictly along ideological lines (Segal & Spaeth 2002).

The Roberts Court has already established its own record in multiple landmark cases affirming crucial provisions of the Affordable Care Act (Obamacare) in King v. Burwell and NFIB v. Sebelius and recently affirmed the national right to gay marriage in Obergefell v. Hodges. (Is anyone else having trouble with that first name?) Besides these more recent opinions, they've changed (or maybe gutted) laws on campaign finance and voting rights.

It would seem that SCOTUS is a model system for exploring our understanding of collective behavior. Not only do we have excellent data (courtesy of Spaeth and colleagues), but their decisions can have international repercussions, so it's quite relevant to us. A while ago, we published a sort of guide for how to use a maximum entropy model for group behavior on SCOTUS. I thought I would write about what our statistical mechanics model on collective voting behavior might tell us about the Roberts Court.

Our model tells us that the only relevant data points we need to understand the average voting behavior is the individual averages and the pairwise correlations between justice votes as I show below.

The fact that a pairwise model is sufficient means that the correlation matrix has all the information that we need in order to reproduce the distribution of votes made by the Court. Even behaviors that seem to correspond to some higher level of coordination like unanimity or ideological blocs arise out of these lower order interactions. On the other side, we can't do without interactions because a model with independent justices does not even get close. I think it's worthwhile to mention that this model is derived from maximum entropy, which means that the model embodies the a mathematical formulation of Occam's razor--it is the simplest model that captures these patterns; no more, no less.

The figure below (It's hideous. Sorry!) shows the average vote of each justice where -1 is always liberal to +1 always conservative. Clearly, the conservatives tend to vote above the dashed line and liberals below. The mean bias is the parameter $$h_{\rm i}$$ from the model that points at some intrinsic tendency to vote in particular direction (if you're unfamiliar with the notation, the $$i$$ is an placeholder for the particular justice). As expected, conservative voters tend to have positive mean bias and liberals the opposite. The proper way to read the shown values is to put them into an $$e^{h_{\rm i}}$$ and that is the factor that contributes to the ideological tendency to vote (if it's been a while since you've done math, this factor $$e$$ is just the number 2.718... and we are putting it to the power of the number represented by $$J_{\rm ij}$$. Why $$e$$? Unrelated, but it's a rather magical number...). Let's take the peculiar case of Roberts who tends to vote conservative, but has a negative mean bias. This means that when a case shows up, he is a factor of $$e^{h_{\rm JR}} = e^{0.16} \sim 1.2$$ more likely to take the liberal rather than the conservative side.

We know that Roberts sometimes makes some surprising decisions that seem against his presumably conservative leanings, and this bias towards the liberal direction encodes that. If we were to take the model literally, we might argue that Roberts is not as conservative as those who appointed him might have hoped, but the mapping from model parameters to real beliefs is not clear.

The other interesting case is Breyer who has the opposite pattern, his bias is to be $$e^{-h_{\rm SB}} \sim 1.2$$ times more likely to vote conservative. In the Second Rehnquist Court (1995-2006), Breyer has a slightly conservative bias $$h_{\rm i}$$, so it's interesting that he shows the reverse here. The biases are small in both cases so it could be a statistical effect, but it could also mean that Breyer sits a bit more to the left relative to the others on this Court.

Mean ideological tendencies. Average ranges from -1 (completely liberal) to +1 (completely conservative). The mean bias is tendency to vote in a particular direction as given by the model, or the local field for each spin $$h_{\rm i}$$.

But we can't stop here because the individuals behaviors of the justices are insufficient to explain the prevalence of bloc formation in votes. SCOTUS is dominated by a strong tendency towards consensus (nearly 50% of votes compared to ~20% for ideological splits), and this behavior emerges from our model only when interactions are included--although our model does not tell us whether the agreement is a result of shared beliefs previous to joining the Court or really from interaction between justices during deliberation. Let's take a look at the correlations.

Breakdown of votes in the majority for the Roberts Court according to the model. SCOTUS is dominated by consensus then by ideological friction.

The way that we would read the matrix of correlations is to first note that the entire matrix is positive; all justices tend to vote with each other. I've organized the justices from left to right and top to bottom from most conservative record to most liberal, and the ideological block structure is obvious. The conservative core is the 4 justices Thomas, Scalia, Alito and Roberts with Kennedy on the fringe. The liberal wing is Sotomayor, Kagan, Ginsburg, and Breyer. Similarly, we found that we could make out clear conservative and liberal blocs from the correlation matrix in the Second Rehnquist Court (1995-2006).

(left) Average pairwise correlation. Correlation is not just the average agreement between pairs of justices but the correlation minus the product of the means, also known as the covariance. (right) Couplings from model.

The corresponding model parameters are the couplings between each pair of justices. You might think of a coupling as a telephone line that links each pair of justices. By connecting all the pairs, we have a network with 36 phone lines. Now, imagine we're an intern sitting at the office of one of these justices, and our justice is being coy about his vote. But we know that if we call any of the other justices, we can get a good idea of how that justice will vote. So we go down the telephone list and poll the other justices. The way our model accounts for this new information is to take each coupling between justices i and j, $$J_{\rm ij}$$, and stick it into an $$e^{\,J_{\rm ij}}$$ that tells us by what factor our justice will be more or less likely to vote with that justice.

The couplings show much more complicated structure. One way to read this is to pick a row (let's start with Thomas) and look at the value of his couplings with everyone else. Unsurprisingly, his coupling with Scalia is in the strongest out of all couplings in the matrix. If Scalia votes particular way, we would expect that Thomas is $$e^{J_{\rm AK,CT }}$$ ~ 3 times more likely to vote in that same direction. If we instead know how Ginsburg will vote, we would find that Thomas would less likely to vote in that same direction by a factor of 0.6. Interestingly, the direction that Kennedy (AK) or Kagan (EK) vote has almost no contribution to how Thomas would vote, a factor of $$\sim e^0 = 1$$ that leaves his vote unchanged.

We against can do the same exercise with everyone else, so let me note a few interesting ones.

• Knowing Roberts' vote means Scalia is 3 times more likely to vote in the same direction, but 3 times less likely to vote in the direction given by Breyer. In fact, his couplings show strong variance and are strong with liberals Ginsburg (RG) and Kagan (EK) but not Breyer (SB) who is slightly conservative.
• Kennedy is the only one with no strong negative couplings with anyone. His only negative coupling with Thomas is nearly 0, meaning that Thomas' vote is not meaningful for knowing how Kennedy will behave. This is indicative of the unique position that Kennedy holds on the Court--that somehow he spans both ideological sides despite the existence of competing factions.
• Besides Kennedy, Roberts (JR) also dominates the center of the Court with strong couplings with the Scalia, Alito, Kennedy, and Breyer.
• This list is certainly not exhaustive. Feel free to leave a comment if you think there's anything of particular interest.

Another thing we can get from the model are the prototypical votes that characterize the system (energy minima in the 9-dimensional energy landscape). Reminiscent of the Second Rehnquist Court, we find that the three votes are the unanimous, ideological divide, and Scalia and Thomas against the rest. This is below where black and white denote votes in opposing directions (say, to affirm or deny the case at hand). From our model, we can look at the probabilities that different justices will break ranks from these votes.

The prototypical votes. Unanimous, ideological, and conservative core. This appear naturally as important features in the data as local maxima in probability and in our model as local minima in energy.

Let's start with the unanimous vote. Both the unanimous conservative and liberal votes are very strongly attracting. For both votes, it takes a large push for any single justice to break ranks. In physics, we typically think of this "push" in terms of an energy cost required to move us from one configuration to another, so let me use this language. With the conservative vote, the member that easiest to push out is Alito. By comparing the change in energy required to push him to disagree, we can compare the probability of the Court to be either unanimous or with Alito disagreeing. \begin{align*} \frac{1}{1+e^{\Delta E}} = 0.05 \end{align*} Nearly a 20:1 ratio! For the unanimous liberal decision, we find that the cost of either Ginsburg or Sotomayor breaking ranks are very similar and leads to about 0.06. So, in this sense the unanimous liberal decision is slightly less firm than the unanimous conservative position.

The story is very different for the ideological vote. The energy difference between Kennedy voting with the conservative or with the liberals is small, $$\Delta E \approx 0.1$$, meaning that if we had to pick between which side Kennedy would choose, the ratio would be roughly 1:1, or 0.475:0.525 to be precise. Because it takes so little energy to tip Kennedy in one direction or another, it will be hard to predict the side to which Kennedy will go.

By combining the cost of pushing a justice away from a vote reflects something about the underlying coupling network. This means that we might use this model to probe what happens if a justice were able to tug these couplings to influence the votes of the others. Let's imagine our justice calling the others on the telephone lines to push them in a particular direction. Of course, our couplings are symmetric so that the receiving justice will push back just as hard, but let's set it up so that our calling justice has decided not to budge before calling. What impact does this have? (If you're curious, this is the $$\Gamma_{\rm i}$$ quantity we define in our paper).

We might contrast this sort of interaction strength with a more general measure of influence: the predictive power that a justice's vote has over the final outcome of the vote, technically the "mutual information" between the justice's vote and the majority vote, but we can informally call this the correlation. We plot these two quantities against each other below.

Types of influence against each other. The x-axis is the "correlation" between the vote the justice and the vote of the majority, or the mutual information in bits. The y-axis is the influence measured through the local couplings of a justice.

The clear outlier is Kennedy who seems both influential in his network and in his ability to predict the final vote of the Court. Next up is Roberts. Ginsburg is left far behind everyone else. She is also the most senior member of the liberal wing (seniority is dominated by the conservatives: Scalia, Kennedy, Thomas, then Ginsburg). Perhaps for a related reason, she seems to be isolated. Remember that she is not tightly coupled to the dominant conservative wing, but she's also more likely to diverge from the unanimous liberal decision.

So we can use this model to make some interesting observations about the justices and their relationships on SCOTUS! There is more we can go into here, but I'll point you to our paper. Before I end, I'd like to dwell on the description of this model as "maximum entropy." This is a method of building a model that agrees with the data and incorporates a mathematical formulation of Occam's razor. Maximum entropy, or maxent, is not always the easiest way to build a model, but a powerful tool especially in science, where we hope that our contributions stand up to scrutiny when tested against data and we should be as concise with and as explicit about our assumptions as possible.