Tag Archives: covid-19


Oh dear. Here we go again. What personality are you? How the Myers-Briggs test took over the world.

It gets boring shooting down M-B. It’s like shooting fish in a barrel. After all, when M-B compares unfavourably even to a questionnaire (see link to Quartz article about the Big-Five) that states:

Rather than giving an absolute score in each of the Big Five categories, they tell you your percentile in comparison to others within your gender

you know you’re in deep doo-doo. You’re making interpersonal comparisons. Contrary to the Guardian’s quoted criticisms of M-B as unrealistic binary choices, that is NOT its problem. Discrete choices are EXACTLY what you should be getting people to do. It is how you INTEPRET and ANALYSE them that matters. Some tips on judging these types of instrument:

  • Ensure they are based on a sound theoretical model. Schwartz’s List of Values is good because the types appear as segments of a kind of pie chart. Diametrically opposite types are on opposite sides of the circle whilst more similar ones are closer together.
  • If you can’t run a regression to give complete results for ONE person – without drawing on ANY information from ANY other person – then it’s bad.
  • The corollary to the above point is that you must statistically have positive degrees of freedom: more independent datapoints than parameters being estimated. Which means repeated choices. Which leads to:
  • You must get insights into an individual’s consistency (variance). Only in certain controversial areas of life do humans typically exhibit perfect consistency. Generally, kids, older people, people with lower levels of education and/or literacy display higher variances.

The kind of questions these questionnaires should be addressing are ones like “Of the multi-dimensional universes of “types”, which type or mixture of types best describes me, when I’ve been asked to do as many comparisons as possible?”

Even then, even if you get a proper statistical design (e.g. an orthogonal design), then two people might look very different in terms of their observed frequencies in agreeing with each statement. Person A has frequencies (estimated probabilities) that are all fairly squashed toward the size of the choice set: so if you’ve presented pairs, they’ll all be close to 0.5. Person B might have frequencies that all close to one and zero. If the PATTERN is the same, though, person A and person B are likely the same type of person. It’s just that for some reason person B was more consistent (lower variance) in answering.

I never worked on personality questionnaires but I did discuss issues with Geoff Soutar and Julie Lee when they came to work with Louviere many times during my 6 years in Sydney. So I know this stream of work quite well. Schwartz himself decided to “throw away” his old scoring system for the LoV – which necessarily spent many pages trying to net out person-specific heuristics – in favour of Best-Worst Scaling. BWS avoid getting people to use numbers. It uses the most natural way to make a choice, one from a few.

As a final note, this brings me back to a comment I’ve seen on NC by someone who was genuinely trying to be helpful in understanding the logit and probit models. Unfortunately the link was to a Stata working paper I’ve deliberately steered clear of because it all goes wrong in the final two pages.

Those “tricks” to understand means and variances? Dig out your logit/probit data for ONE individual. Can you run them? Unless you’ve been doing a well-designed discrete choice experiment you’re about to ask me “are you out of your mind? Everyone knows you get just a one or a zero for a person”. That, dear reader, is why the writer has not properly thought through this guide.

Predicted probabilities, BIC, etc are, in fact, all still potentially wrong because the likelihood function based on logt/probit models fixes the variance. So even following all the rules you can misinterpret the mean-variance split. You need external information. Which is why the “sterilising/non-sterilising vaccine” info regarding Sars-Cov2 is so crucial. I now can definitively rule out the “means model” – which is exactly what the conventional logit/probit models assume. So their results are wrong by design.



Revelation re Covid

Occasionally you are putting your thoughts into words and realise you finally “get” something. That happened today when explaining why I was suspicious of two papers “explaining” Sars-Cov2 (aka Covid-19) that were linked to by NakedCapitalism.com. NB NC were not “endorsing” these studies, merely putting them out there for discussion and critique. I duly did and had a revelation.

I know whether SARS-COV-2 has primarily mean or variance effects. It is mostly about variances. Which is the nightmare scenario. How did I come to this revelation? Well, as usual, it was by absorbing the wise words and  experience of those who are “at the front line”.

Here is the deal.

  • We know none of the vaccines for SARS-COV-2 are sterilising.
  • Thus you “catch” it more than once.
  • We know from breakthrough cases and rapid emergence of variants (that respond at differential rates to existing vaccines) that people don’t follow a binary model [0,1] – be protected through chance/vaccine or get Covid. They can get it 2+ times.
  • Thus we have a logit/probit model with variances – when it comes to a “latent scale of susceptibility to infection” people do not have a “mountain” that is shifted following a bout or a vaccine. The vaccine just flattens the mountain into a gentle hill. Less likely to get horrifically ill but high variance – they can get it multiple times.
  • The papers referred to, as do all the papers I’ve read so far, assume the vaccine effects are ENTIRELY BASED IN MEANS.
  • This is conceptually incompatible with what we know from the vaccines and what their manufacturers state (albeit in small print sometimes) – the vaccines are non-sterilising. They reduce symptom severity but don’t stop you getting SARS-Cov-2 again.


So what will be the final outcome? Basically ANY piece that doesn’t attempt (even in a rudimentary way) to separate, or at least comment on, the mean-variance confound and note that the evidence favours variances is not going to be read by me. It goes into the same class as “papers that try to explain flights via flat earth paradigms”. Garbage. Nice to finally have a good rule that enables me to implement a policy I’ve rarely had enough “concrete data” to support. However, the data and interpretation from the good people at places like NC have “solved” the mean-variance confound for me.

Any paper that quotes risk/odds ratios without discussing variances is trash. I’m not reading or commenting on it. Maybe I’ll print it out for use in the next toilet paper shortage? Full stop.

Covid postscript

Just a few comments to clarify complex ideas regarding “variances” in limited dependent variable models.

When I say “mountains are flattened”, I mean disease severity is reduced among that subgroup who were previously “hit hard”, but the burden of disease is spread more evenly across everyone. So in ten parallel universes, instead of the same 10% of people ALWAYS getting very ill, 90% of people will get somewhat ill. The particular 90% varies in each universe. You personally are no longer “assured” of getting (or not getting) covid. The variance goes up across the population. Though down for a lot of individuals who previously would have had the same outcome 10 out of 10 times.

Identifying people with zero variance is important. These people are deterministic, not probabilistic. They can’t be in a biostats (logit/probit) model. You just described them qualitatively according to what determines their disease status. Don’t attempt to “include them thinking that they just boost sample sizes and improve precision”. It’s like saying “I’m going to take an average of a bunch of numbers that includes infinity”. Dumb Dumb Dumb. These people, annoying though they are for logit/probit models, are actually useful in policy if you can find WHY they do/don’t get ill.

The KEY factor here is variance heterogeneity. If one group of people, in 100 parallel universes, experience 40 cases, but the SAME 40 people in all 100 universes, then they CANNOT be aggregated with another group of people who, in 100 parallel universes, experience on average 40 cases, but the 40 cases varies immensely across the 100 universes. Any universe from the first hundred has “consistency”. And universe from the second hundred has extreme inconsistency. Aggregating them can’t be done. It doesn’t, UNLIKE A LINEAR MODEL, just mess with standard errors. It causes BIAS.


COVID-19 variants. Statistical concerns

This piece draws heavily upon a piece published at NakedCapitalism. Pretty much all the references regarding epidemiological explanations and “on the ground” observations are there so in the interests of brevity (and my own schedule at the moment) I’ll simply give that as the main reference. I’ll put a few notes in regarding other issues though.


I’ve written before that stated preference (SP) data using logit/probit models – examples of limited dependent variable models, so-called because the outcome isn’t continuous like GDP or blood pressure – are very hard to interpret [a]. Technically they have an infinite number of solutions. It is incumbent upon the researcher either to collect a second dataset, totally independent in nature of the first (so we now have two equations to solve for the two unknowns – mean and variance) or use experience and common sense to give us the most likely explanation (or a small number of likely ones). This is technically true of revealed preference data (actual observed decisions) too [b] and Covid-19 might be an unfolding horrific example of where we are pursuing the “wrong” interpretation of the observed outcomes.

Background: What’s happened in various “high vaccination” countries so far?

In short, rates of Covid-19 initially dropped through the floor, typically in line with vaccination coverage, then started bouncing back.  However, the large correlation with hospitalisation and death did not re-appear. This is consistent with the fact the vaccines are not “sterilising vaccines” – you can still catch Covid-19, it’s just that the vaccine is (largely) stopping the infection from playing havoc with your body.

Sounds like a step forward? Actually, without widespread adjunct interventions (good mask usage etc) to stop the spread in the first place, this is potentially very very bad. We’ve already seen variants arise. The Delta variant is causing increasing havoc, whilst Lambda is becoming dominant in South America. The Pfizer vaccine – which thanks to media failures was often touted as “the bestest evah” – seems particularly ill-equipped to deal with Delta. NC is covering this very well.

The bio-chemists and colleagues can give good explanations of WHAT is happening pharmacologically and epidemiologically in producing these variants. Our archetypal drunk lost his keys on the way back from the pub. However, just like the story, he’s looking for them only under the lamp-post, whilst they’re actually on the dark part of the road; if you can’t or won’t look in the right place of course you won’t find the solution. This is what many experts are doing and why Delta etc could keep happening and at an increasing pace and perhaps is the real story: one with roots in statistics.

What’s the possible statistical issue here?

Consider how medical statisticians (amongst others) typically think about discrete (infected/non-infected, or live/die) outcomes. As in the SP case the [0,1] outcome is incapable of giving you a separate mean – “average number of times a human – or particular subgroup – would get bad Covid-19 for a given level of exposure” – and variance “consistency of getting it it for this given level of exposure”. If 80% of Covid sufferers at a given exposure level needed hospital care but only 20% do when vaccinated, then analysts tend to think that the average number of people has gone down.

Suppose the “extreme opposite interpretation” (equally consistent with the observed data) is true? Suppose it’s a variance effect? So, the vaccine is not really – on average – bringing the theoretical average hospitalisation rate down. Or not by much anyway. It is simply “pushing what was a high peaked thin mountain into a fat, low altitude hill” in the vaccination function relating underlying Covid-19 status with observable key outcomes. Far more people are in the tails, with an emphasis on the “hey, now Covid is no big deal for me” end [c]. The odds of hospitalisation following vaccination goes way down. However, if you look at subgroups, you’ll (if you’re experienced) be spotting a tell-tale giveaway: the pattern of odds ratios across subgroups by vaccination status is VERY SIMILAR TO BEFORE, they have all just (for instance) halved. This is a trick I’ve used in SP data for decades and more often shows that some intervention has a variance effect. Fewer people are going to hospital if vaccinated but their average tendency to get a bad bout is actually unchanged by vaccination (particularly if we add the confounding factor of TIME – Covid is changing FAST).

This provides an ideal opportunity for the virus to quietly mutate, spread and via natural selection, find a variant that is more virulent but which, when coupled with fewer people taking precautions, gives a greater tendency for a variant to emerge that is both “longer incubating” but then potentially “suddenly more lethal”.

So vaccines were a bad thing?

At this point in time I’ll say “NO” [d]. However, in conjunction with bad human behaviour and an inability to think through the statistics, they have led to a complacency that might lead to worse long-term outcomes. The moral of the story is one that sites like NC have been emphasising since the start and which certain official medical and statistical authorities really dropped the ball on right from the get-go.

The vaccines merely bought us time. Time we wasted. Now a long-ignored problem with the logit (or probit) function, being the key tool we use to plug “discrete cases of disease” into a “function relating underlying Covid-19 to observed disease status” might be our undoing. Far fewer people are going to hospital following vaccination (smaller mean effect in terms of lethality) but a MUCH larger number of people have become juicy petri dishes for the virus to play in (larger variance). We have concentrated way too much on the former. The statistics textbooks tend to stress that explanation.

Trouble is, too few people read the small print at the bottom warning them that their logit/probit estimates could just as easily arise from variances, not means. Assume you observe:

  • an answer of 8 in non-vaccinated group. You assume mean prevalence=8 and (inverse of) variance=1, as the stats program always does: 8*1=8.
  • In vaccinated group you see answer of 4. Wow, the mean (prevalence due to vaccination effect) has halved prevalence because you ALWAYS ASSUME THE VARIANCE IS 1. So you “must” have got 4 via 4*1 because that is what you must do to get 4!

Oops. How you SHOULD have “divied up the mean and inverse of variance” was 8*1 in non-vaccinated group and 8*(1/2) in vaccinated group. You have a treatment effect that is in fact unchanged. The inverse of the variance halved – in other words the variance doubled. People less consistently got ill [e]

For someone like me who used to deal primarily with stated preference data the worst thing that could happen was that I’d lose the client when model-based predictions went wrong (because I’d made the wrong “split” between means and variances).

The stakes here are much much bigger. This piece is the “statistical issue” – a potential big misinterpretation of Covid-19 data – which really worries people like me.


[a] See my blog and NC reprinted one of my posts.

[b] This is how and why Daniel McFadden won the so-called Economics Nobel – he predicted the demand for the BART in California extraordinarily accurately, before it was even built. He had both stated and revealed preference data on transport usage.

[c] You can’t keep symmetry if you keep squashing the mountain down. The right tail hits 100%. So you “see” a lot more people in the left tail (doing “well”) as a result of vaccination. This leads to the mean effect – so vaccination is unlikely to be 100% variance related. There must be a certain degree of mean effect here. My point is that the “real mean effect of vaccination” is theoretically a lot less than we observe from the data.

[d] With the “Keynes get out-clause”.

[e] The  actual logit and probit functions basically spit out a vector of “beta hats” but which are actually “true betas MULTIPLIED by AN INVERSE function of the variance on the latent scale”. So when variances go up – which in SP data happens when you get answers from people with lower literacy etc – then the “beta hats” (and hence odds ratios) all DECREASE in absolute magnitude. In other words, confusingly for non-stats people, we (to make the equation look less intimidating) tend to define a function of the variance (lambda – not to be confused with the Covid one) or mu that is MULTIPLICATIVE with the “true beta”. Believe me if you think this was a stupid simplification that will lead to confusion as people talk at cross-purposes you are not alone.

UK covid stats confusion

There is a lot of confusion, both in the MSM and sites like nakedcapitalism as to what is going on with Covid-19 in the UK. The PHE (daily….ish) stats suggest the number of cases is falling and has been doing so for almost a week now. Hooray?…. But the (lagging) ONS data doesn’t show this. Maybe the ONS data will show the same when it “catches up”. It is about 2 weeks behind in terms of data collection, analysis and interpretation. It is “better” in that it is more of a random sample so bias (in terms of who actually gets tested) is theoretically less, but never under-estimate the interaction between humans (via their psychology) and a supposedly controlled trial. If people don’t want to participate because they suspect the “real result” will annoy them they’ll find ways to mess with you, the statistical designer.

Thus, perhaps the lower rates of positive results in the former are simply because lots of people being “pinged” by their phone app and others who strongly suspect they might have Covid-19 but who aren’t very ill are simply “going to ground” and not getting tested (in fear of being “grounded” for 7-14 days). BBC radio the other day informed people “if you’re pinged you don’t have to self-isolate but if you’re directly contacted you do”. Is that true? And if so, how many heard it and are following it? I dunno.

At this stage there is simply no robust way to know what is going on. People are impatient, however, and insist on speculating so as to “be first to the punch” in getting the interpretation correct, rather than waiting 10 days. Some of these people should know better. When someone blatantly speculating with anecdotes, reported here, only has their official qualifications reported, rather than details of the geographic and other factors that might help us put into context their anecdotal data, then any good statistician should get suspicious. If you live in a North Norfolk Westminster constituency where the MP not only got a plurality but a large majority at the last election (for the ruling Conservative Party) I don’t really think you live somewhere that is remotely representative of the country. When I know your professional (GEOGRAPHICAL) positions, and, via my higher education and friends (living across East Anglia), know practically all public transport routes you might have taken (if not driving) to get to anywhere, and know you are NOT encountering “the average Brit” then I won’t apologise for not paying much attention to your anecdote.

Don’t get me wrong. Anecdotes are SOMETIMES useful. Sometimes an anecdote forms the germ for a theory that ultimately leads to a massive paradigm change. However, if you’re going to quote an anecdote you should be allowing someone like me to put it into proper context. I’ll put more stock into an anecdote from someone who lives life “close” to the knife-edge of what is going on in terms of (for instance) reactions to Covid-19 and this inevitably, from what we know already, involves living close to BAME people, people on both sides of the median income and other such measures that, for better or worse, have come to define “class” in modern Britain. So, here are my personal experiences over the last couple of days, followed by some factors that should enable any competent “data person” to get a “picture” of where I live, how people behave, etc.

What happened today when helping my mother do her weekly shopping:

We went to Sainsbury’s (posher) and then Aldi (less posh – lots more lower income or “class” people). At Sainsbury’s my mum’s first comment was “why is it not freezing in here like usual?” – the temperature was barely below the 17 degrees celcius outdoors. I replied “they have turned the a/c and filters down to minimum to lower costs. Of course this reduces air circulation so don’t linger around people or areas”. (Nice of you to care so much for our welfare Sainsbury’s!) I also noted that the percentage of people wearing masks was no more than about 30%. I’m defining mask usage as “proper mask usage and not using it as a chin diaper as South Park so memorably put it”. Massive staff shortage but not across the board. Staff were “encouraging” (quite forcefully) people to use self-serve check-outs rather than the normal human-operated-tills. We then went to shop 2.

Aldi was freezing and well ventilated (just like before lockdown ended on 17th). Percentage of people using masks (and correctly) was definitely above 75%. Tills responsive to demand, so the queues were much shorter than Sainsbury’s. Us “deplorables”(?) were in the majority but were behaving pretty much in same way as before restrictions were eliminated.

Shopping across Arnold (one of the main suburbs of Gedling – our Parliamentary Constituency which famously was a key brick in the “red wall” that fell to the Tories in 2019 – and one of the main suburban centres in the “donut” surrounding Nottingham City) saw widespread mask usage. Quite a few people pulled them off when exiting shops, but they did that pre-17th too so no change there. There was NO visible change in mask usage compared to the lockdown period. Some people wore masks incorrectly, some pulled them off as soon as they walked down the high street having exited the supermarket, but the tendency had not changed.

Getting a wider context?

A distateful comment made by one was that the poorer stupid people are the “offenders”. What I saw today was the exact opposite. In fact, I’ve lived in the “poshest” postcode in Australia, which also had the highest rate of pertussis (whooping cough) 2010-2011 when I lived there 2009-2015. (Data now “conveniently aggregated into regions – didn’t wanna show up the sitting PM?????) A friend of my age got it. He, like me, had been vaccinated as a child but unfortunately the vaccination isn’t lifelong and when it was given in the 1970s they assumed it would eradicate this terrible disease and didn’t count on middle class wankers who preferred “to see their alternative therapist who did crap with stones”. I’ve said this before. Physicians in primary care in the UK quite like the “less educated” when it comes to public health campaigns because such people just “follow the rules without question”. You might question whether morally that’s good. But it’s just an observation and one that makes physicians’s lives less intolerable.

Oh, more observations. Half the people entering Sainsbury’s used the hand sanitiser on their hands and shopping trolley (cart). Close to 100% of Aldi customers did so. Aldi have also still got the perspex separators between checkouts. Sainsbury’s has removed them all.

Another observation regarding how “people might be avoiding pubs” – maybe it’s a SUPPLY issue, not a DEMAND issue? Thought about that? ALL THREE PUBS in walking distance of my house, which had served food (under very strict rules, eating outdoors at distanced benches) during lockdown, ironically STOPPED serving food, and REDUCED their intake of drinkers on 17th. Why? No doubt due to to the “pingdemic” and a lack of staff. Only now is food and drink serving capacity beginning to increase again, and very very gradually.

A note on sampling when quoting an anecdote

A very statistical person like me who has also had a lot of exposure to qualitative variability will know how the qual people deal with variability. It is actually quite a clever process but must be used with care. They use “purposive sampling” quite often. The idea is that you must cover “all key groups” but you don’t aim for representativeness of the population. Thus you get 5 straight whites, 5 gay whites, 5 straight blacks, 5 gay blacks…….you get the idea. This way no key group is omitted just because in the “wider population” they comprise <5% (or some other smallish threshold). If you want to quote an anecdote that might hold water nationally in order to provide a potential generalisable explanation for an odd result, you should be purposively sampling to see what “each key group is doing”.

Obviously, if you live in a place that:

  • Was one of the first to house significant BAME populations
  • Has a wide span of incomes
  • Has a good span of sociodemographics
  • Has a good span of any other health/other factors we know to be relevant to COVID-19

Then you probably are in a better position to make statements about “what is going on” and “where it might lead”. Nottingham is one of the best places in the UK satisfying the above criteria. Leicester is another, but Nottingham has a (perhaps by accident) very good electoral “map” in terms of “unitary authority” (autonomous city) surrounded by “donut” of district/borough councils with differing degrees of affluence.

So what do I think?

It’s not about ONE factor. It never is. Multivariable analysis will tease out the effects eventually but in the meantime, having access to various groups we “would sample via purposive sampling” I get insights into what different types of Nottinghamian are doing.

  • BAME groups are profoundly suspicious of the vaccine and COVID itself – this is madness but given historical experience, this is more understandable. My cousin’s wife is BAME and refuses the vaccine, thinking it’s a white plot. FFS.
  • Students are know-it-alls with under-developed senses of their own mortality. My nephew (undergrad student) was just diagnosed with covid. Many have frankly execrable mathematical ability.
  • Richer people can be similar. Think they know better. If you want my view, read Douglas Adams’s books.


Stop looking for single explanations

Class/income is a nice explanation. Unfortunately the data don’t support you. THe wards/areas mentioned are rich/full of students or BAME. Indeed student arrogance has been re-recognised just a week ago.  BAME groups are often the least vaccinated, despite being at the highest risk. Posh rich people are also often anti-vax. So are students – look at the wards in Nottingham with low vaccination rates – those of use who live in Nottm region recognise these as BAME and/or student wards. OF course a lot of students are mathematically innumerate and think they can lecture me on the topic. Hmmmm. Anyway rates are falling across wards no matter “who might gain – Tory or Labour” – hardly implying massively reduced testing.

IF you are going to quote an anecdote to make a major point regarding a CRUCIAL statistic then you must be prepared to provide a LOT of supplemental information about “your circumstances” so anybody with knowledge can check up on it. Otherwise how do we know you aren’t a troll spoofing someone or you are real but getting a grossly biased picture?

I am well known in terms of where I live and my background and circumstances. When I give an anecdote it’s really really easy to establish if I’m “just seeing something local”. I DO NOT CARE IF YOU ARE THE BEST PERSON IN THE WORLD IN A FIELD OF ECONOMICS i RESPECT – IF YOU QUOTE ANECDOTES WITHOUT THE REQUISITE INFO ON YOUR LOCATION AND LOCAL CHARACTERISTICS YOUR VIEW IS NO BETTER THAN ANYONE ELSE’S.