Tag Archives: statistics

COVID-19 variants. Statistical concerns

This piece draws heavily upon a piece published at NakedCapitalism. Pretty much all the references regarding epidemiological explanations and “on the ground” observations are there so in the interests of brevity (and my own schedule at the moment) I’ll simply give that as the main reference. I’ll put a few notes in regarding other issues though.


I’ve written before that stated preference (SP) data using logit/probit models – examples of limited dependent variable models, so-called because the outcome isn’t continuous like GDP or blood pressure – are very hard to interpret [a]. Technically they have an infinite number of solutions. It is incumbent upon the researcher either to collect a second dataset, totally independent in nature of the first (so we now have two equations to solve for the two unknowns – mean and variance) or use experience and common sense to give us the most likely explanation (or a small number of likely ones). This is technically true of revealed preference data (actual observed decisions) too [b] and Covid-19 might be an unfolding horrific example of where we are pursuing the “wrong” interpretation of the observed outcomes.

Background: What’s happened in various “high vaccination” countries so far?

In short, rates of Covid-19 initially dropped through the floor, typically in line with vaccination coverage, then started bouncing back.  However, the large correlation with hospitalisation and death did not re-appear. This is consistent with the fact the vaccines are not “sterilising vaccines” – you can still catch Covid-19, it’s just that the vaccine is (largely) stopping the infection from playing havoc with your body.

Sounds like a step forward? Actually, without widespread adjunct interventions (good mask usage etc) to stop the spread in the first place, this is potentially very very bad. We’ve already seen variants arise. The Delta variant is causing increasing havoc, whilst Lambda is becoming dominant in South America. The Pfizer vaccine – which thanks to media failures was often touted as “the bestest evah” – seems particularly ill-equipped to deal with Delta. NC is covering this very well.

The bio-chemists and colleagues can give good explanations of WHAT is happening pharmacologically and epidemiologically in producing these variants. Our archetypal drunk lost his keys on the way back from the pub. However, just like the story, he’s looking for them only under the lamp-post, whilst they’re actually on the dark part of the road; if you can’t or won’t look in the right place of course you won’t find the solution. This is what many experts are doing and why Delta etc could keep happening and at an increasing pace and perhaps is the real story: one with roots in statistics.

What’s the possible statistical issue here?

Consider how medical statisticians (amongst others) typically think about discrete (infected/non-infected, or live/die) outcomes. As in the SP case the [0,1] outcome is incapable of giving you a separate mean – “average number of times a human – or particular subgroup – would get bad Covid-19 for a given level of exposure” – and variance “consistency of getting it it for this given level of exposure”. If 80% of Covid sufferers at a given exposure level needed hospital care but only 20% do when vaccinated, then analysts tend to think that the average number of people has gone down.

Suppose the “extreme opposite interpretation” (equally consistent with the observed data) is true? Suppose it’s a variance effect? So, the vaccine is not really – on average – bringing the theoretical average hospitalisation rate down. Or not by much anyway. It is simply “pushing what was a high peaked thin mountain into a fat, low altitude hill” in the vaccination function relating underlying Covid-19 status with observable key outcomes. Far more people are in the tails, with an emphasis on the “hey, now Covid is no big deal for me” end [c]. The odds of hospitalisation following vaccination goes way down. However, if you look at subgroups, you’ll (if you’re experienced) be spotting a tell-tale giveaway: the pattern of odds ratios across subgroups by vaccination status is VERY SIMILAR TO BEFORE, they have all just (for instance) halved. This is a trick I’ve used in SP data for decades and more often shows that some intervention has a variance effect. Fewer people are going to hospital if vaccinated but their average tendency to get a bad bout is actually unchanged by vaccination (particularly if we add the confounding factor of TIME – Covid is changing FAST).

This provides an ideal opportunity for the virus to quietly mutate, spread and via natural selection, find a variant that is more virulent but which, when coupled with fewer people taking precautions, gives a greater tendency for a variant to emerge that is both “longer incubating” but then potentially “suddenly more lethal”.

So vaccines were a bad thing?

At this point in time I’ll say “NO” [d]. However, in conjunction with bad human behaviour and an inability to think through the statistics, they have led to a complacency that might lead to worse long-term outcomes. The moral of the story is one that sites like NC have been emphasising since the start and which certain official medical and statistical authorities really dropped the ball on right from the get-go.

The vaccines merely bought us time. Time we wasted. Now a long-ignored problem with the logit (or probit) function, being the key tool we use to plug “discrete cases of disease” into a “function relating underlying Covid-19 to observed disease status” might be our undoing. Far fewer people are going to hospital following vaccination (smaller mean effect in terms of lethality) but a MUCH larger number of people have become juicy petri dishes for the virus to play in (larger variance). We have concentrated way too much on the former. The statistics textbooks tend to stress that explanation.

Trouble is, too few people read the small print at the bottom warning them that their logit/probit estimates could just as easily arise from variances, not means. Assume you observe:

  • an answer of 8 in non-vaccinated group. You assume mean prevalence=8 and (inverse of) variance=1, as the stats program always does: 8*1=8.
  • In vaccinated group you see answer of 4. Wow, the mean (prevalence due to vaccination effect) has halved prevalence because you ALWAYS ASSUME THE VARIANCE IS 1. So you “must” have got 4 via 4*1 because that is what you must do to get 4!

Oops. How you SHOULD have “divied up the mean and inverse of variance” was 8*1 in non-vaccinated group and 8*(1/2) in vaccinated group. You have a treatment effect that is in fact unchanged. The inverse of the variance halved – in other words the variance doubled. People less consistently got ill [e]

For someone like me who used to deal primarily with stated preference data the worst thing that could happen was that I’d lose the client when model-based predictions went wrong (because I’d made the wrong “split” between means and variances).

The stakes here are much much bigger. This piece is the “statistical issue” – a potential big misinterpretation of Covid-19 data – which really worries people like me.


[a] See my blog and NC reprinted one of my posts.

[b] This is how and why Daniel McFadden won the so-called Economics Nobel – he predicted the demand for the BART in California extraordinarily accurately, before it was even built. He had both stated and revealed preference data on transport usage.

[c] You can’t keep symmetry if you keep squashing the mountain down. The right tail hits 100%. So you “see” a lot more people in the left tail (doing “well”) as a result of vaccination. This leads to the mean effect – so vaccination is unlikely to be 100% variance related. There must be a certain degree of mean effect here. My point is that the “real mean effect of vaccination” is theoretically a lot less than we observe from the data.

[d] With the “Keynes get out-clause”.

[e] The  actual logit and probit functions basically spit out a vector of “beta hats” but which are actually “true betas MULTIPLIED by AN INVERSE function of the variance on the latent scale”. So when variances go up – which in SP data happens when you get answers from people with lower literacy etc – then the “beta hats” (and hence odds ratios) all DECREASE in absolute magnitude. In other words, confusingly for non-stats people, we (to make the equation look less intimidating) tend to define a function of the variance (lambda – not to be confused with the Covid one) or mu that is MULTIPLICATIVE with the “true beta”. Believe me if you think this was a stupid simplification that will lead to confusion as people talk at cross-purposes you are not alone.

UK covid stats confusion

There is a lot of confusion, both in the MSM and sites like nakedcapitalism as to what is going on with Covid-19 in the UK. The PHE (daily….ish) stats suggest the number of cases is falling and has been doing so for almost a week now. Hooray?…. But the (lagging) ONS data doesn’t show this. Maybe the ONS data will show the same when it “catches up”. It is about 2 weeks behind in terms of data collection, analysis and interpretation. It is “better” in that it is more of a random sample so bias (in terms of who actually gets tested) is theoretically less, but never under-estimate the interaction between humans (via their psychology) and a supposedly controlled trial. If people don’t want to participate because they suspect the “real result” will annoy them they’ll find ways to mess with you, the statistical designer.

Thus, perhaps the lower rates of positive results in the former are simply because lots of people being “pinged” by their phone app and others who strongly suspect they might have Covid-19 but who aren’t very ill are simply “going to ground” and not getting tested (in fear of being “grounded” for 7-14 days). BBC radio the other day informed people “if you’re pinged you don’t have to self-isolate but if you’re directly contacted you do”. Is that true? And if so, how many heard it and are following it? I dunno.

At this stage there is simply no robust way to know what is going on. People are impatient, however, and insist on speculating so as to “be first to the punch” in getting the interpretation correct, rather than waiting 10 days. Some of these people should know better. When someone blatantly speculating with anecdotes, reported here, only has their official qualifications reported, rather than details of the geographic and other factors that might help us put into context their anecdotal data, then any good statistician should get suspicious. If you live in a North Norfolk Westminster constituency where the MP not only got a plurality but a large majority at the last election (for the ruling Conservative Party) I don’t really think you live somewhere that is remotely representative of the country. When I know your professional (GEOGRAPHICAL) positions, and, via my higher education and friends (living across East Anglia), know practically all public transport routes you might have taken (if not driving) to get to anywhere, and know you are NOT encountering “the average Brit” then I won’t apologise for not paying much attention to your anecdote.

Don’t get me wrong. Anecdotes are SOMETIMES useful. Sometimes an anecdote forms the germ for a theory that ultimately leads to a massive paradigm change. However, if you’re going to quote an anecdote you should be allowing someone like me to put it into proper context. I’ll put more stock into an anecdote from someone who lives life “close” to the knife-edge of what is going on in terms of (for instance) reactions to Covid-19 and this inevitably, from what we know already, involves living close to BAME people, people on both sides of the median income and other such measures that, for better or worse, have come to define “class” in modern Britain. So, here are my personal experiences over the last couple of days, followed by some factors that should enable any competent “data person” to get a “picture” of where I live, how people behave, etc.

What happened today when helping my mother do her weekly shopping:

We went to Sainsbury’s (posher) and then Aldi (less posh – lots more lower income or “class” people). At Sainsbury’s my mum’s first comment was “why is it not freezing in here like usual?” – the temperature was barely below the 17 degrees celcius outdoors. I replied “they have turned the a/c and filters down to minimum to lower costs. Of course this reduces air circulation so don’t linger around people or areas”. (Nice of you to care so much for our welfare Sainsbury’s!) I also noted that the percentage of people wearing masks was no more than about 30%. I’m defining mask usage as “proper mask usage and not using it as a chin diaper as South Park so memorably put it”. Massive staff shortage but not across the board. Staff were “encouraging” (quite forcefully) people to use self-serve check-outs rather than the normal human-operated-tills. We then went to shop 2.

Aldi was freezing and well ventilated (just like before lockdown ended on 17th). Percentage of people using masks (and correctly) was definitely above 75%. Tills responsive to demand, so the queues were much shorter than Sainsbury’s. Us “deplorables”(?) were in the majority but were behaving pretty much in same way as before restrictions were eliminated.

Shopping across Arnold (one of the main suburbs of Gedling – our Parliamentary Constituency which famously was a key brick in the “red wall” that fell to the Tories in 2019 – and one of the main suburban centres in the “donut” surrounding Nottingham City) saw widespread mask usage. Quite a few people pulled them off when exiting shops, but they did that pre-17th too so no change there. There was NO visible change in mask usage compared to the lockdown period. Some people wore masks incorrectly, some pulled them off as soon as they walked down the high street having exited the supermarket, but the tendency had not changed.

Getting a wider context?

A distateful comment made by one was that the poorer stupid people are the “offenders”. What I saw today was the exact opposite. In fact, I’ve lived in the “poshest” postcode in Australia, which also had the highest rate of pertussis (whooping cough) 2010-2011 when I lived there 2009-2015. (Data now “conveniently aggregated into regions – didn’t wanna show up the sitting PM?????) A friend of my age got it. He, like me, had been vaccinated as a child but unfortunately the vaccination isn’t lifelong and when it was given in the 1970s they assumed it would eradicate this terrible disease and didn’t count on middle class wankers who preferred “to see their alternative therapist who did crap with stones”. I’ve said this before. Physicians in primary care in the UK quite like the “less educated” when it comes to public health campaigns because such people just “follow the rules without question”. You might question whether morally that’s good. But it’s just an observation and one that makes physicians’s lives less intolerable.

Oh, more observations. Half the people entering Sainsbury’s used the hand sanitiser on their hands and shopping trolley (cart). Close to 100% of Aldi customers did so. Aldi have also still got the perspex separators between checkouts. Sainsbury’s has removed them all.

Another observation regarding how “people might be avoiding pubs” – maybe it’s a SUPPLY issue, not a DEMAND issue? Thought about that? ALL THREE PUBS in walking distance of my house, which had served food (under very strict rules, eating outdoors at distanced benches) during lockdown, ironically STOPPED serving food, and REDUCED their intake of drinkers on 17th. Why? No doubt due to to the “pingdemic” and a lack of staff. Only now is food and drink serving capacity beginning to increase again, and very very gradually.

A note on sampling when quoting an anecdote

A very statistical person like me who has also had a lot of exposure to qualitative variability will know how the qual people deal with variability. It is actually quite a clever process but must be used with care. They use “purposive sampling” quite often. The idea is that you must cover “all key groups” but you don’t aim for representativeness of the population. Thus you get 5 straight whites, 5 gay whites, 5 straight blacks, 5 gay blacks…….you get the idea. This way no key group is omitted just because in the “wider population” they comprise <5% (or some other smallish threshold). If you want to quote an anecdote that might hold water nationally in order to provide a potential generalisable explanation for an odd result, you should be purposively sampling to see what “each key group is doing”.

Obviously, if you live in a place that:

  • Was one of the first to house significant BAME populations
  • Has a wide span of incomes
  • Has a good span of sociodemographics
  • Has a good span of any other health/other factors we know to be relevant to COVID-19

Then you probably are in a better position to make statements about “what is going on” and “where it might lead”. Nottingham is one of the best places in the UK satisfying the above criteria. Leicester is another, but Nottingham has a (perhaps by accident) very good electoral “map” in terms of “unitary authority” (autonomous city) surrounded by “donut” of district/borough councils with differing degrees of affluence.

So what do I think?

It’s not about ONE factor. It never is. Multivariable analysis will tease out the effects eventually but in the meantime, having access to various groups we “would sample via purposive sampling” I get insights into what different types of Nottinghamian are doing.

  • BAME groups are profoundly suspicious of the vaccine and COVID itself – this is madness but given historical experience, this is more understandable. My cousin’s wife is BAME and refuses the vaccine, thinking it’s a white plot. FFS.
  • Students are know-it-alls with under-developed senses of their own mortality. My nephew (undergrad student) was just diagnosed with covid. Many have frankly execrable mathematical ability.
  • Richer people can be similar. Think they know better. If you want my view, read Douglas Adams’s books.


Stop looking for single explanations

Class/income is a nice explanation. Unfortunately the data don’t support you. THe wards/areas mentioned are rich/full of students or BAME. Indeed student arrogance has been re-recognised just a week ago.  BAME groups are often the least vaccinated, despite being at the highest risk. Posh rich people are also often anti-vax. So are students – look at the wards in Nottingham with low vaccination rates – those of use who live in Nottm region recognise these as BAME and/or student wards. OF course a lot of students are mathematically innumerate and think they can lecture me on the topic. Hmmmm. Anyway rates are falling across wards no matter “who might gain – Tory or Labour” – hardly implying massively reduced testing.

IF you are going to quote an anecdote to make a major point regarding a CRUCIAL statistic then you must be prepared to provide a LOT of supplemental information about “your circumstances” so anybody with knowledge can check up on it. Otherwise how do we know you aren’t a troll spoofing someone or you are real but getting a grossly biased picture?

I am well known in terms of where I live and my background and circumstances. When I give an anecdote it’s really really easy to establish if I’m “just seeing something local”. I DO NOT CARE IF YOU ARE THE BEST PERSON IN THE WORLD IN A FIELD OF ECONOMICS i RESPECT – IF YOU QUOTE ANECDOTES WITHOUT THE REQUISITE INFO ON YOUR LOCATION AND LOCAL CHARACTERISTICS YOUR VIEW IS NO BETTER THAN ANYONE ELSE’S.