Oh dear. Here we go again. What personality are you? How the Myers-Briggs test took over the world.

It gets boring shooting down M-B. It’s like shooting fish in a barrel. After all, when M-B compares unfavourably even to a questionnaire (see link to Quartz article about the Big-Five) that states:

Rather than giving an absolute score in each of the Big Five categories, they tell you your percentile in comparison to others within your gender

you know you’re in deep doo-doo. You’re making interpersonal comparisons. Contrary to the Guardian’s quoted criticisms of M-B as unrealistic binary choices, that is NOT its problem. Discrete choices are EXACTLY what you should be getting people to do. It is how you INTEPRET and ANALYSE them that matters. Some tips on judging these types of instrument:

- Ensure they are based on a sound theoretical model. Schwartz’s List of Values is good because the types appear as segments of a kind of pie chart. Diametrically opposite types are on opposite sides of the circle whilst more similar ones are closer together.
- If you can’t run a regression to give complete results for ONE person – without drawing on ANY information from ANY other person – then it’s bad.
- The corollary to the above point is that you must statistically have positive degrees of freedom: more independent datapoints than parameters being estimated. Which means repeated choices. Which leads to:
- You must get insights into an individual’s consistency (variance). Only in certain controversial areas of life do humans typically exhibit perfect consistency. Generally, kids, older people, people with lower levels of education and/or literacy display higher variances.

The kind of questions these questionnaires should be addressing are ones like “Of the multi-dimensional universes of “types”, which type or mixture of types best describes me, when I’ve been asked to do as many comparisons as possible?”

Even then, even if you get a proper statistical design (e.g. an orthogonal design), then two people might look very different in terms of their observed frequencies in agreeing with each statement. Person A has frequencies (estimated probabilities) that are all fairly squashed toward the size of the choice set: so if you’ve presented pairs, they’ll all be close to 0.5. Person B might have frequencies that all close to one and zero. If the PATTERN is the same, though, person A and person B are likely the same type of person. It’s just that for some reason person B was more consistent (lower variance) in answering.

I never worked on personality questionnaires but I did discuss issues with Geoff Soutar and Julie Lee when they came to work with Louviere many times during my 6 years in Sydney. So I know this stream of work quite well. Schwartz himself decided to “throw away” his old scoring system for the LoV – which necessarily spent many pages trying to net out person-specific heuristics – in favour of Best-Worst Scaling. BWS avoid getting people to use numbers. It uses the most natural way to make a choice, one from a few.

As a final note, this brings me back to a comment I’ve seen on NC by someone who was genuinely trying to be helpful in understanding the logit and probit models. Unfortunately the link was to a Stata working paper I’ve deliberately steered clear of because it all goes wrong in the final two pages.

Those “tricks” to understand means and variances? Dig out your logit/probit data for ONE individual. Can you run them? Unless you’ve been doing a well-designed discrete choice experiment you’re about to ask me “are you out of your mind? Everyone knows you get just a one or a zero for a person”. That, dear reader, is why the writer has not properly thought through this guide.

Predicted probabilities, BIC, etc are, in fact, all still potentially wrong because the likelihood function based on logt/probit models fixes the variance. So even following all the rules you can misinterpret the mean-variance split. You need external information. Which is why the “sterilising/non-sterilising vaccine” info regarding Sars-Cov2 is so crucial. I now can definitively rule out the “means model” – which is exactly what the conventional logit/probit models assume. So their results are wrong by design.