We in the UK recently had our (roughly) four-yearly night of the brilliant Professor Sir John Curtice helping us make sense of the incoming results of the 2024 UK General Election. The guy is the quintessential quirky British genius. Yet his effortless interpretation of any constituency result to let us know “which way the wind is blowing” is actually a lot more than it seems: he is using a lifetime of experience to solve something in political polling that strictly speaking isn’t solvable: there are an infinite number of solutions to political polls. More formally, every poll has one equation with two unknowns. It’s like if I tell you x+y=8, solve for x. Sir John knows enough to know approximately “where y is located” and therefore what x probably is.

But this is an art, NOT strictly speaking, a second dataset that would unequivocally allow you to solve for x and y. I have quoted this “confounding problem” (one equation with two unknowns) for years on various media. Kind people have taken me at my word when I quote references to the peer-reviewed academic pieces proving this. Yet, quite understandably, some have (politely) asked “but what is the intuition?” That is what I’d like to try to provide here.

**What is the confounding problem in polling?**

To answer this, it is necessary to gain a vague understanding of what psephologists like Sir John and many applied statisticians are doing when they see “numbers of votes for each candidate” (we are working at the level of the CONSTITUENCY here) and use fancy statistical models to infer two things:

- How
**strong**is the support for each candidate? - How
**certain/consistent**is an individual person’s support for each candidate?

The first is a measure of “how much you identify with a candidate” whilst the second is a measure of “how often you’ll stick with the candidate”. It turns out that each can make or break a candidate, yet even the completed voting counts can’t separate these two effects. A candidate might win because they have very strong support, OR they might win because although having “weak” levels of support, their voters are consistent and reliable compared to “fly-by-nights” who support the other candidates.

Let’s be clear here: someone like Sir John, via a llifetime of experience, knows that in our silly example equation y is “somewhere around 6” so x must be somewhere around 2. But that’s knowledge based on experience and frankly I’m not confident that his successors will bring such levels of experience to the table.

To really “solve the puzzle” and “break the confound” we really need a second dataset: something that tells us something else about the relationship between x and y. Sir John uses experience but going forward, we really need more data if we’re going to understand who is going to vote for whom and when they’ll switch.

**An example using a very “middle England” constituency**

To illustrate how EVERY statistical package implements a type of regression that inputs the number of votes for each candidate and outputs the “strength of their support” I’ll make up some polling from a hypothetical constituency – let’s call it “England-Mid”.

Here are some hypothetical results from it.

Whilst we can think in terms of “results from 1000 sampled respondents” or somesuch, in fact, the logit model most commonly utilised to analyse data that is not continuous but discrete (in other words choose one option from two or more) actually works at the level of the individual person (in this case potential voter). So we could think of these percentages as “internal levels of support in the head of a hypothetical voter who we’ll call Jo”.

This seems strange not only to non-statisticians but many statistical experts who are not used to thinking about “internal brain processes and preferences of the individual human”. In areas like academic marketing, random utility theory is used. This was developed by Thurstone in 1927 and I’m linking to one of the less bad sources. He proposed the idea that a person might have some internal scale. In a lot of the post-war era, in the context of politics, it was a simple one dimensional spectrum conceptualised as “left-right” in your outlook. When a given political party’s manifesto moved to the “peak” of the distribution that described “how left/right wing you were” then the chances that you’d vote for them would be maximised. With two parties, centred at different parts of the spectrum, your chances of voting for each could be mathematically calculated (in theory) by varying their manifestos and seeing when you “switched”.

So, whilst I will talk about Jo, the single person, in the interests of making concrete the idea that this paradigm works at the level of the individual voter, you may think of a “bunch of people like Jo” if you wish. Jo, and a bunch of people like her, seems to heavily skew Conservative. However, there’s a lot more going on here.

**These percentages conceal some crucial aspects of Jo’s thinking**

What if I told you that there were two very different explanations for these “percentage insights into Jo’s mind”? Or, worse still, an infinite number of explanations? In fact there are, and anybody who has read the main key article describing how the logit model uses percentages to infer a person’s “affinity” (strength of affilation) with each of two or more discrete options AND how consistently (s)he actually expresses this (whether Jo virtually always votes Conservative or has somewhat “soft” support) or has read the boring background/bibliography section to the logit and other such models in the manuals for programs like Stata will know where this is going.

For everyone else, I’ll try to do what I promised and tease out the intuition. To do so we must try to get our head around the idea that whilst Jo has some “affinity” scale in her head, on which all parties are placed – each has a mean (average) value, her scale also has a variance for each option – how consistently she would stick with a given party. Whilst it is tempting to think that the 48% value for the Conservatives represents ALWAYS how often she’d vote Tory, this is not in fact how to interpret that number. It is merely “what proportion of “Jo-types” went Conservative ON THIS OCCASION”. The percentage might vary a lot in other universes/on other occasions. (To use what will become a very rapidly dated analogy, Deadpool shoots a LOT of bullets in the MCU but in other universes, the “alternative Deadpools” still shoot a lot but maybe more, maybe less.)

So, let’s delve down into Jo’s “strength of preference” – her “mean” affinity with each party – and her “consistency” – her tendency to stick with a party or get distracted by another.

For this exercise we are going to keep this (old fashioned) model. I can add all the “other dimensions” that influence people later: “social” position (liberal or conservative) might differ from “economic” position (liberal or conservative); pro/anti Europe, etc. To explain the basic statistical problem we’ll simply think of a scale which I’ll call what you identify most with – “identification”. In the UK there is a strong chance that the five parties here can be placed on this scale according to how relatively left/right-wing they are but with BREXIT etc it should be obvious that I’ve had to simplify somewhat. Indeed, the idea of the “old school Leftie anti-European” will be crucial in showing why the numbers I’ve just given can be deeply misleading.

**What do pollsters do when interpreting these kind of numbers?**

To answer this, unfortunately we must take a quick look at what the logit model does, in terms of turning a discrete choice (such as “vote Conservative from the five parties on offer”) into something continuous (inferring a scale where all 5 parties have a kind of affinity score that Jo derives from them).

To take this further, suppose Jo and her “clones” have an “affinity” score for each party. It is helpful to think of it as a number on the x-axis of the normal distribution, so it could theoretically go to plus or minus infinity but most of the time will lie in the [-2,+2] range. (NB we can add/subtract a constant without changing anything so I’ve made all my numbers positive). Zero represents indifference (“meh”). Each party has a score (“mean affinity”), let’s call it the “beta” for a political party. So she has a value for beta_Conservative, a beta_Labour, etc.

Crucially, as already mentioned, each party also has a variance: in discussing the logit model we tend to think of the inverse of this, the “consistency”, typically labelled lamda in the equation. Thus, if Jo is a “die-hard Conservative” who has a small variance, this means she has high consistency – she virtually always votes Tory, if that is the party she identifies with most. In addition to having a high beta_Conservative, her lamda_conservative is high. This is NOT always the case. You can label yourself Conservative and think that that is your “home” but the distribution around your “high beta” is a gentle hill, not a very peaked mountain: you can be persuaded to switch more easily.

The logit function can’t separate the beta from the lamda. When the computer program tries various numbers to get the set of numbers that are most likely to give the percentages 48, 29, 7, 6, 10, it uses the following expression:

In short, the “vote shares” (probabilities) in the table above are the left hand side. These need to be explained by five “lamda-Vees”.The Vee is the utility (“affinity score”). So if all affinity scores were zero (“meh to every party”) then each probability would be exp(0)=1 divided by the five exponents (each being one) so 1/5=20%. So that figures. As the Vee (affinity score) increases for a given Party, then its contribution to the total (the numerator over the denominator) increases.

Crucially, however, the logit outputs lamda_times_Vee! You can’t separate the two! The equation above assumes lamda is the same across people and options but I’ve deliberately described a more realistic option where lamda varies by person and even by PARTY for a given person.

Ooops. So what does Stata or any other program do? It sets lamda to one or some other constant. If that makes you worried then you should be. In the table below I simply used the inverse of the exponential (the natural log) in the equation to give affinity scores, given the “percentages”.

So, according to pollsters, we have numbers showing Jo’s level of affinity with each party. As the raw “percentages” suggest, she feels most aligned with the Conservatives, then Labour, with Reform UK, the Lib Dems and Greens in that order a distant 3rd, 4th and 5th. However, this is a potentially deeply misleading characterisation of her views.

This is what a “dumb statistician” would report, if he or she did not know more about the constituency (like Sir John does).

So……What if I told you Jo could, via different lamdas, be someone who was the classic “Blue Wall Conservative” who could have voted either way in the BREXIT referendum and who only had truck with Labour and the Conservatives and very possibly the Lib Dems? Or that her data were equally consistent with a person resident in part of the “broken-and-now-rebuilt-red-wall” in the Midlands who actually was “old school Labour” and who lent support to the right because she thought Labour were in the pockets of the EU and wanted to kick the establishment in the teeth? Both explanations are possible from these data.

Part 2 will explain how these data are consistent with either of those explanations. However, what is REALLY worrying is that nothing in the data alone can alert you to this. You need additional information in order to properly disentangle Jo’s affinity from her consistency.

**TL;DR**

Essentially, the researcher MUST bring a second set of data to the table. This is ideally another set of survey data, but often a big dose of common sense must suffice. The TL;DR is that every opinion poll is an equation with two unknowns. It is by definition unsolvable. Some skilled people can bring extra information to the table to give some accuracy but it’s about time people pointed out the Emperor is wearing no clothes.

**Bibliography**

Thurstone LL. A law of comparative judgment. Psychological Review, 34, 273-286 (1927).

Louviere JJ. What you don’t know might hurt you: some unresolved issues in the design and

analysis of discrete choice experiments. Environmental and Resource Economics, 34, 173-188 (2006)

Hensher DA, Louviere JJ, Swait J. Combining sources of preference data. Journal of Econometrics, 89, 197-221 (1999).

Adonis Yatchew and Zvi Griliches. Specification Error in Probit Models

The Review of Economics and Statistics Vol. 67, No. 1 (Feb., 1985), pp. 134-139

Best-Worst Scaling: Theory, Methods and Applications. Louviere JJ, Flynn TN, Marley AAJ. Cambridge University Press (2015).

Marley, A. A. J. (1968). Some probabilistic models of simple choice and ranking. Journal of Mathematical Psychology, 5, 333-357.

McFadden D, Zarembka P. Conditional logit analysis of qualitative choice behavior. In:

Frontiers in Econometrics. (Academic Press, New York, 1974) 105-142.

**Part 2 tomorrow**