human choice

Random Utility Theory

 

I have left the field I made my name in during my postdoctoral work – random utility theory. The reasons are not ones I will go into here. However, I thought I’d write a post – maybe my final word on the subject – attempting to explain it and its importance.

 

Let’s split the discussion into a few sections:

  1. What the economists (and hence most people think) about how humans make decisions;
  2. How and why a model from psychology developed in the 1920s moves us so much further forward, but suffers from a problem whereby its predictions, under certain circumstances can “look so much like economics that economics gets a pass”
  3. How a better knowledge of how this psychological theory can and cannot be used, could help us immensely.

 

  1. What the economists (and hence most people think) about how humans make decisions

 

Economics generally splits into two branches – microeconomics (seeking to understand and explain how individuals make choices) and macroeconomics (seeking to understand how entire economies might work or not work). We are dealing strictly with the former (although where this “carries through” to the latter will be touched upon).

 

In microeconomics, humans are assumed to be what has come to be called “homo economicus” – a somewhat “computer-like” entity that works out what provides the most “utility” (what makes you best off) and causes you to choose the thing or path that maximises the chances of getting that. EVERY TIME. So if we had a science-fiction “multiverse” of 1000 parallel universes, all identical up to the point of choice, our “key human” makes the SAME choice in all 1000 universes. If you observe something different in one of these universes it’s due to some error on the part of the OBSERVER’s ability to observe everything that matters, NOT the observed human. That’s the “economics interpretation of Random Utility Theory – RUT”.

 

Psychologists have, following observations of REAL humans in a variety of contexts for 100 years come to a different conclusion. They too observe that our “key human” might do something different in one or more universes. BUT they believe this is due to some inherent property of HUMAN BEHAVIOUR. Namely that whilst our “key human” has a mean (average) tendency to “perform action x”, there is a VARIANCE associated with the behaviour. Thus, on AVERAGE across our 1000 universes we’d certainly see our key human “do x”, there are a number of universes in which he/she does “not x”.

 

This variance might be very very small – this typically happens when “action x” taps into something like an intrinsic attitude – think about views about things like abortion. The variance might be large – think about something our human really isn’t very sure about (typically due to lack of experience like a new feature of a mobile/cell phone). OR maybe it’s something in between – like brand of baked beans, in which our key human has a definite preferred brand but “random factors” can cause a change in brand bought for no discernible reason.

 

Why does the “choice of discipline matter?” For that, we’ll go to section 2, dealing with the philosophical underpinnings of the models and why statistics can’t tell us which is “right”

 

  1. How and why a model from psychology developed in the 1920s moves us so much further forward, but suffers from a problem whereby its predictions, under certain circumstances can “look so much like economics that economics gets a pass”

 

Here is the basic problem in understanding whether the economists or the psychologists are correct. In most cases their predictions (and the maths) are observationally equivalent. In other words, there is no test we can administer that will give result “A” if the human is homo economicus, or result “B” if the human is “homo psychologicus”. The “right” model comes down, a lot of the time, to issues like philosophy and epistemology – how you think about the world. Now, there is a growing body of evidence – based on MRI and other medical research, huge amounts of observation and other fields, that suggests the psychologists are probably closer to the truth than the economists. Indeed it is interesting that the economists are the ones who keep having to “amend” their theories to allow for problems like “intransitivity” – when I prefer A to B, B to C, but C to A. That SHOULD NOT happen in a well-designed experiment if homo economicus reigns supreme. But the psychological model has no problem with this because it is PROBABILISTIC – THERE WILL BE OCCASIONS IN WHICH I DO THIS AND THAT’S FINE – WE ARE NOT COMPUTERS.

 

LL Thurstone in the 1920s developed the “psychology version of RUT” basically positing that (for example) you have a mean (average) position on a latent (unobserved) scale for something like “political preference – left/right” but that you have a variance and in (say) 20% of universes you will vote Republican despite the fact your “average” position is on the democrat side of the scale.

 

McFadden (coming from economics) saw the implications of Thurstone’s work, and using some slightly unconventional “tweaks” to economics (to overcome the problem that we’re not homo economicus aka computers) used it to win the so-called-economics-nobel prize for successfully predicting demand for a new light rail system that HAD NOT YET BEEN BUILT. However, there remains controversy to this day as to whether his success came more from psychology or more from “tweaking economics” to be like psychology.

 

So how does a more “psychological mindset” (as opposed to homo economicus one) help us? Can we make better predictions without having to keep “tweaking economics” as has been happening with more and more frequency via things like “behavioural economics” and its tools such as “nudge theory” etc?

 

  1. How a better knowledge of how this psychological theory can and cannot be used, could help us immensely

 

To cut to the chase, the psychology version of RUT means that in 1000 universes we may make a “particular discrete choice” (e.g. “vote Democrat”) 800 times but do otherwise in the other 200 universes. The trouble is we only observe ONE universe. So what do we do? This is where discrete choice modelling comes in. If we can design cunning experiments that:

  • Get the respondent to keep making essentially the same choice a number of times BUT
  • Don’t LOOK like the same choice (thus alerting them to our “subterfuge” and therefore encouraging them either to “keep choosing the same” – to “look good” – or “start switching” – to “be awkward”)

Then we can estimate CORRECTLY “how often our human votes democrat” (for instance).

Sounds great, hey?

In theory, if you solve the practical problems mentioned above, you certainly can do this. Unfortunately there is a mathematical problem we simply can’t get past.

Here is the problem, expressed first mathematically, then intuitively:

  • ANY probability observed in repeated experiments tells you NOTHING about the mean and variance: mathematically they are PERFECTLY CONFOUNDED (mixed). Thus you CANNOT know if ANY observed choice probability represents a mean effect, a variance effect, or somewhere in between
  • In short, someone choosing “democrat” might be doing so because of strong affiliation, weak affiliation but with the “majority of the distribution” of their support being to the left, or something in between

SUPPOSE OUR HUMAN VOTES DEMOCRAT 80% OF THE TIME. UNFORTUNATELY THIS IS NOT ENOUGH TO TELL US THE (1) MEAN AND (2) VARIANCE ON THE “LATENT SCALE OF POLITICAL PERSUASION”.

This insight was proven in a key mathematical statistical paper in the mid 1980s. In short:

  • You might get 800 out of 1000 “vote Democrat” outcomes because the person genuinely believes in 80% of the Democrat manifesto (the “mean” is 80% but variance is zero);
  • You might get 800 out of 1000 “vote Democrat” outcomes because the person genuinely believes in the Democrat manifesto (the “mean”) some other number (70% or 90%) but the variance (lack of certainty in the “Democrat candidates) is sufficiently high that we “See” 800 Democrat successes but this is NOT a valid representation of the person’s “position on the latent political scale” – they might be MORE or LESS Democrat…….but UNCERTAINTY (variance) effects cause the actual observed chance of putting a checkmark in the Democrat box to be 80%
  • You might get 800 out of 1000 “vote Democrat” outcomes because the person is actually much more closer to the 50/50 mark (in terms of MEAN position on the politics scale) but a large variance (degree of uncertainty) in THIS election led to a large number of “Democrat votes” in our hypothetical multi-universe elections.

 

So where does that leave us?

In trouble basically – it means that ANY OBSERVED FREQUENCY OF CHOICE IS CONSISTENT WITH AN INFINITE NUMBER OF EXPLANATIONS…….RANGING FROM:

  • Mean effects – the person would “always” go that way
  • Variance effects – the person “happened” to show an 80% level of support but this was largely because their support is so “soft” that the number of universes in which they “go Democrat” could vary dramatically
  • A “mix” – this is the most likely – the person has an inherent “affiliation” with a party but could “swing” to another under the right conditions

 

Clearly, if we run “enough” trials, where we cunningly change things in a way that we understand is more likely to identify a “mean effect” or a “variance effect” then we can begin to understand which of the above three worlds we are in. I did this in the 2017 UK General Election. I beat the polling organisations and bookmakers and made a profit – it was small (since I’d never used my model in elections before and wanted proof of concept) – but I showed it could be done.

 

How did I do this? Well, I realised that stats programs are wrong. Here is what happens when turning a “political affiliation” into a vote, and then, using a STATS PROGRAM to do VICE VERSA:

  • The program uses a distribution (probit or logit) to turn a “% level of support for the Democrats” into a “discrete choice” – a CONTINUOUS outcome is turned into a discrete one – and thereby information is LOST
  • If we go vice versa (As we do in ANY election prediction) we must make assumptions about “how much variation is a mean effect and how much is a variance effects”. ALL stats programs set the variance to be equal to one (they “normalise” the variance).
  • If the variance is NOT the same across groups of voters then you predict incorrectly.
  • I, AND YOUGOV, realisised that the “variance” (consistency of response) was predictable via attitudes. Thus in 2017 UK General Election the “alternative” model of YouGov worked well – that model, AND MINE, beat all the main models. I was just sad that YouGov measured attitudes using Rating Scales (notoriously unreliable). Thus their model didn’t really work in 2019 and will probably get binned entirely – which is a shame – they’re on the right track.

 

 

So, IF we understand, via attitudes, something about variances, we can “make the stats program adjust the variances to be correct – and NOT one”. Then predictions will be better.

 

I’d love to see this happen but I’m not sure it will……but if you do it right then aggregation (i.e. the macroeconomics) will suddenly predict well.