Category Archives: Economics

happiness redux

There is a piece on happiness at NakedCapitalism.com up today. It is a guest post from VoxEU and unfortunately, though trying to make valid points, falls into the usual holes: the key one is that the data all appear to be Likert-based self-reported happiness scales, which in two major countries (at the very least) have been shown to be deeply misleading (US and Australia). In short, even within these two countries, there are cohort and/or longitudinal effects: the number you state your happiness/life satisfaction to be is heavily dependent upon age (particularly if you are older), independent of (after adjusting for) a huge number of other factors (health, wealth, social empowerment, independence, etc). Moreover this is not “just” the infamous “mid-life dip”: the differences between such measures, and the more comprehensive well-being/quality-of-life ones, are particularly stark in extreme old age and have big implications for retirement age, what resources are needed by the very old etc.

To make comparisons across countries with different cultural backgrounds seems even more hazardous – Likert scales generally were pretty much discredited on such grounds by 2001:

Baumgartner H, Steenkamp J-BEM. Response styles in marketing research: a cross-national investigation. Journal of Marketing Research. 2001;38(2):143-56.

Steenkamp J-BEM, Baumgartner H. Assessing measurement invariance in cross-national consumer research. Journal of Consumer Research. 1998;25(1):78-90.

Five year age bands showing mean levels (after rescaling) of self-rated happiness versus scored quality of life in Bristol

Five year age bands showing mean levels (after rescaling) of self-rated happiness versus scored quality of life in Bristol, UK

 

 

 

 

 

 

 

 

The above shows that the ICECAP-O measure (based on discrete choice based outcomes of McFadden, coupled with Capabilities Approach of Sen, both winners of the Economics “Nobel”) tracks happiness (after both rescaled to be on 0-1 scale) reasonably well til middle age. In old age people report suspiciously high life satisfaction/happiness scores even when they have a whole host of problems in their lives. We captured these in the ICECAP-O (collected from the same people who gave us life satisfaction scores), as well as their individual answers to a huge number of questions about these other factors in life. This has been found in the USA too:

US life satisfaction

US life satisfaction

 

 

 

 

 

 

 

In short, we don’t have a bloody clue what older people are doing when they answer these scales but sure aren’t doing the same thing as younger people.

I discussed further the contribution of trust toward a broad measure of well-being in a talk I gave years ago when in Sydney: in Australia it is basically the case that a lack of trust of those in the local community has a pretty huge (11%) detrimental effect to your quality of life in Sydney but a much smaller, though still significant (5%) effect elsewhere in Australia.

I wish these Likert-based happiness surveys would cease. They really don’t help the field, when much better alternatives are already in routine use.

algorithms are bad mkayy

One of the blogs I follow is Math Babe and she has just published a book on (amongst other things) the problems with big data (which I intend to buy and read as soon as I get the time). The Guardian reprinted some of it, which is great for bringing this to a wider audience.

I left the following comment at her entry which mentions the Guardian article, but I think it might have disappeared into moderation purgatory as my first attempt to post “from an account” was the WordPress.com one which I don’t use (as opposed to this .org). Anyway the gist of what I said was that she is entirely right to lambast the use of automatic rules and algorithms to analyse (for instance) personality data used in recruitment. However, smart companies (1) don’t use psychometric data, they use Best-Worst Scaling which cannot be “gamed” and (2) use human input to interpret the results. Anyway here’s my comment to her blog post…EDIT – the comment appeared, hooray!

****************************************************************************

Hi. Nice article and I intend to get and read the book when things calm down a little in my work life. I just have two comments, one that is entirely in line with what you have said, and one which is a mild critique of your understanding of the personality questionnaires now being used by certain companies.

First, I agree entirely that the “decision rule” to cut down the number of “viable” candidates based on various metrics should not be automated. Awful practice.

Second, and where I would disagree with you, is in the merits of the “discrete choice” based personality statements (so you *have* to agree with one of several not very nice traits). This is not, in fact, psychometrics. It is an application of Thurstone’s *other* big contribution to applied statistics, random utility theory, which is most definitely a theory of the individual subject (unlike psychometrics which uses between-subject differences to make inferences).

I think you may be unaware that if an appropriate statistical design is used to present these (typically best-worst scaling) personality-trait data then the researcher obtains ratio scaled (probabilistic) inferences which must, by definition be comparable across people and allow you to separate people on *relative* degrees of (say) the big 5. Thus why they can’t be gamed, and why I know of a bank that sailed through the global financial crisis by using these techniques to ensure a robust spread of individuals with differing relative strengths.

If two people genuinely are the same on two less-attractive personality traits then the results will show their relative frequencies of choice to be equal, and those traits will have also competed against other traits elsewhere in the survey (and probably appear “low down” on the latent scale). So there’s nothing intrinsically “wrong” with a personality survey using these methods (see work by Lee Soutar and Louviere who operationalised it for Schwartz’s values survey) – indeed there is lots to commend it over the frankly awful psychometric paradigm of old.

I would simply refer back to my first point (where we agree) and say that the interpretation of the data is an art, not a science, and why people like me get work in interpreting these data. Incidentally and on that subject, I can relate to the own-textbook-buzz, mine came out last year. Smart companies already know how to collect the right data, they just realise they can’t put the results through an algorithm.

adaptive conjoint

I was interviewed for a podcast for MR Realities by Kevin Gray and Dave McCaughan a week or so ago. It went well (bar a technical glitch causing a brief outage in the VOIP call at one point) and apparently the podcast is doing very well compared to others.

One topic raised was adaptive conjoint analysis (ACA). This method seeks to “tweak” the choice sets presented to a respondent based on his/her initial few answers, and thus (the theory goes), “home in” on the trade-offs that matter most to him/her more quickly and efficiently. The trouble is, I don’t like it and don’t think it can work – and the last time I spoke to world design expert Professor John Rose about it, he felt similarly (though our solutions are not identical). There are three reasons I dislike it.

  1. Heckman shared the 2000 Nobel prize with McFadden: sampling on the basis of the dependent variable – the respondent’s observed choices – is perilous and often gives biased results – the long-recognised endogeneity issue.
  2. The second reason is probably more accessible to the average practitioner: suppose the respondent just hasn’t got the hang of the task in the first few questions and unintentionally misleads you about what matters – you may end up asking a load of questions about the “wrong” features.
    You may ask what evidence there is that this is happening. Well my last major paper as an academic showed that even doing the typically smallest “standard” design to give you individual-level estimates of all the main feature effects (the Orthogonal Main Effects Plan, or OMEP) can lead you up the garden path (if, as we found, people used heuristics because the task was difficult) so I simply, genuinely don’t understand how asking a smaller number of questions allows you me to make robust inferences.
  3. But it gets worse: the 3rd reason I don’t like adaptive designs is that if a friend and I seem to have different preferences from the model, I don’t know if we genuinely differ or whether it was that we answered different question designs that caused the result (estimates are confounded with design). And the other key finding of the paper I just mentioned confirmed a body of evidence showing that people do interact with the design – so you can get a different picture of what I value depending on what kind of design you gave me. Which is very worrying. So I just don’t understand the logic of adaptive conjoint and I follow Warren Buffett’s mantra – if I don’t understand the product I don’t sell it to my clients.

John Rose and Michiel Bilemer wrote a paper for a conference way back in 2009 debunking the “change the design to fit the individual” idea. Their solution was novel: the design doesn’t vary by individual (so no confounding issue) but it does change for everyone after a set number of questions. It’s a type of Bayesian efficient design, but requiring some heavy lifting to be done during the survey itself that most people would not be able to do.
Though I think it’s a novel solution, I personally would only do this to the extent that everyone has (for instance) done a design (e.g. at least the OMEP) that elicits individual level estimates, then after segmentation you could administer a second complete survey based on those results: indeed that would solve an issue that has long bugged me – how to you know what priors to use for an individual if you don’t already have DCE results for that individual (since heterogeneity nearly always exists)? But I also have a big dose of scepticism of very efficient designs anyway given the paper I referenced, and that is a different can of worms I opened 🙂

 

 

 

older aussie well-being

A tweet just alerted me to this report on the well-being of older Australians. I haven’t had time to read in detail but a quick skim seems to indicate it did all the “usual” things of checking correlations, doing PCA, etc and then “The final index was then calculated by averaging the five domain index_log”.

Oops.

I cannot but help feel a little frustrated. I gave a talk on this subject 6 years ago in Sydney when working at UTS. Many of my top publications from my time in academia concerned the development and Australian usage of the ICECAP-O instrument (see chapter 12) as a measure of the well-being of (primarily but not just) older people. Advantages it has over the research in the report I’ve just read are the following:

  1. ICECAP-O doesn’t use variables that must be collected from official sources or be part of (say) the SEIFA. The five variables came from extensive qualitative work that established what it is about (say) housing uncertainty that really contributes to/takes away from well-being. We wanted the underlying key conceptual attributes of well-being. So whilst health (for instance) is valued, it is what it gives you in terms of independence, security, enjoyment of activities that really matters.
  2. ICECAP-O is an individual-level one A4 page questionnaire. Four response categories per question means 4^5=1024 distinct “states” you could be in, each with its own percentage score. So slice and dice the data in far more flexible, disaggregated ways than what’s out there so far.
  3. The five domains are NOT simply averaged, nor do the response categories across domains be equally valued – e.g. the 2nd to top level of “love and friendship” is more highly valued, on average, than the top level of ANY of the other four other domains. They don’t all matter equally to older people. There is even a fair degree of heterogeneity AMONG older people as to the relative importance of these, heavily driven by factors such as marital status and gender. We used choice models to find this out and the findings are based on robust well-tested theoretical models.
  4. You can compare with (say) the SEIFA – we did this with the UK Index of Multiple Deprivation in one of my papers looking a British city – and get far better insights. So for instance, measures like the IMD/SEIFA can be misleading when they fail to capture measures of social capital or connectedness.

It’s a shame when disciplines don’t talk to one another. Things could move forward a lot more quickly. And as long as we use SEIFA/IMD type measures in policy, we’re going to be directing resources to the wrong people.

states worse than dead

No, this isn’t another moan by yours truly about how the valuation people deal (in)correctly with states worse than the death on in health economics valuation exercises (phew).

This tweet interested me.  There are all sorts of things you could do with a discrete choice experiment (DCE) to measure the trade-offs such patients make. When at UTS, we did a DCE that did two things, one novel and one not so novel. The first was an attitudinal one that found there are three segments among Australian retired people (our sample was around 1100 total) when you got them to tell you what statements about life they related to most and least – Best-Worst Scaling. We did something never done before – feed back to them their own results after that survey that they could print off, bring to their doctor to discuss, use as the starting point for and end-of-life care plan etc: results of this form a chapter in the book referenced. Of course the doctors at the sharp end in ICUs had warned us that thanks to TV programmes the general public has much higher expectations about the success/acceptablility of these dramatic interventions than is true in practice, but you could do the same survey with patients. In fact the bare bones of the survey are still live at the link and you can see how you compare with older Aussies.

The second DCE was (by DCE standards) very very simple, but was done to get a handle on the trade-offs people woul make regarding the kinds of interventions in the survey in this Twitter post and unfortunately won’t give you personalised results.

These types of DCEs should become routine. They can be done on touchscreen tablet PCs etc when the patient is waiting to see the doctor, they can give personalised results – not aggregated ones like in the bad old days. People like them, and like to know how they compare with others – the older generation love those surveys comparing them to others just as much as the younger “Facebook generations”. C’mon people, this survey is great and very very informative but we can move forward even further and do it today.

EQ-5D-5L thoughts

Well having a 24 hour sickness bug gave me some opportunities to sleep and think!

Obviously I have collaborators and am not in sole position to make the final call on whether we go ahead with the design Karin and I put together. But I think we may be almost ready to get programming!

I’m excited by this: it draws on various findings from previous projects I have been involved with: the DCE is not highly efficient but it serves its purpose, and, importantly, that of the Case 2 (Profile Case) BWS study. I think we might have had difficulties making this work for the original EQ-5D (3-level version), partly due to issues like the “states that make no sense” but the edited wording for the 5-Level version have helped enormously.

This project certainly won’t provide “the answer” as to whether using BWS can or should be used for valuation. However, if it works, (1) I believe it’ll be a major step forward and (2) I hope the EuroQoL group funds follow-up work.

The general thinking is that I don’t think everyone out there can do a “single all singing-all-dancing” valuation task; splitting it into two or three (I believe) will ultimately tell us more and give more flexibility. After all, lead-time TTOs are used for states worse than death so the precedent of more than one task is there. As I mentioned before, even if what we do “works”, there are inevitably issues the Group would have to discuss regarding the use of different valuation techniques etc, which I won’t pre-empt nor under-estimate.

euroqol group funding

Sander Arons, Karin Oudshoorn and I have a EuroQoL funded project: how a *correctly* designed Case 2 best-worst scaling (BWS) study + a DCE can give us a tariff for EQ-5D-5L.

The study will be interesting as we work from first principles: what is the tariff for an INDIVIDUAL person. Then do all the things I’ve been harking on about for 5+ years to get a proper population tariff.

I don’t claim this will turn the world upside down. But I do think it will give some food for thought for whether the Secretariat wants to consider DCEs for the 5L tariff. I fully recognise that the EuroQoL group has all sorts of constraints to work under (linking with the -3L version, consistency of methods etc) which I totally understand. I’m pragmatic these days. But I think, if our testing results hold true in the main study, that we will raise some eyebrows in the results and give the EuroQoL Group the chance to get back in front in terms of methodology.

Interesting times ahead!

research council funding

EDIT 27 July 2016: Just to clarify, this post is not due to any current issues/proposals/projects. It is a piece I have had in mind for a long while now but couldn’t realistically write whilst I was still under funding/contractual obligations. Recent encounters with other funders have (so far, fingers crossed, touch wood) been more positive!

 

Funding by UK health and EU funding councils has, in my experience, been less than satisfactory. Cuts that seem totally arbitrary have been made. In fact I cannot recall a single discrete choice experiment (BWS) study since the original ICECAP-O study that has been fully funded.

Universities I worked at pretty much absorbed the costs. Now, this was partly because all the valuation exercises I have been involved in have raised new issues. But more recent examples have been particularly annoying for my collaborators – they have ended up putting in a lot of extra time. It made me feel guilty, thinking “perhaps I should have costed more time”. But the truth is, whatever I asked for in the past, I got cut BIGTIME.

This comes back to issues that I have had close to my heart for a while now, the issues of (1) what exactly “discrete choice modelling” (DCM) is as a discipline and (2) where it sits within health services research (at least for its application in health). I personally consider choice modelling to be an entire discipline and when in health to be a subdiscpline of HSR akin to “qualitative research” or “biostatistics”. Now, funded trials would not dream of having a jobbing RA to do a fully-fledged qualitative project within a programme grant, nor have an RA do all the biostatistics. So why does DCM get treated as the poor relation? There should ALWAYS be an expert (with 15+ years of experience) in charge of the DCM, with a junior to learn. Because discrete choice experiments (DCEs) are NOT just a branch of health economics or biostatistics. Choice modelling is a discipline in its own right, with a totally different set of statistical skills, economic assumptions and especially psychological knowledge required to do it successfully. It most definitely IS NOT just another preference elicitation method.

Qualitative research quite rightly fought hard to gain acceptance as an equal partner in HSR studies. DCM should too. It’s simply not good enough to have juniors doing DCEs in major clinical trials. Again, would you get an RA without specific training to do all the biostats? No? Then why is it considered acceptable in DCM?

Those of us who spent over 15 years learning our trade find it, frankly, a little insulting, that someone from an MSc or PhD should be considered able to run a DCE in a trial. DCE analysis is not purely a science. It’s very difficult to teach. It’s part art. Do it for 10+ years – ACROSS MULTIPLE DISCIPLINES – then you are just about able to do it properly.

May we have some acknowledgement that what we do is important please?

This isn’t just for the benefit of us “insiders”: the industry loses since the standards of reporting and conduct have not, in my opinion, improved much at all in health. Sooner or later (and sooner if one study I have in mind in a major journal is compared with real preference data) the results of a poor DCE will be comprehensively discredited. Then we all lose. Actually I don’t. Because I do studies predominantly for the private sector who are very sensitive to incorrect results. They commission me because I get them good results. But it’d be a shame if DCM is lost to the public sector. All because nobody wanted to pay for senior people to do things correctly.

I’m told I can be too negative. Fair enough. Yes – standards are a LOT better than 10 years ago. But please remember, whilst the health field has moved on, those fields way ahead of you (marketing/environmental econ/transport econ etc) have also moved on. A lot. You’ve caught up a little. But not by enough. Experience has become doubly important – reading the literature won’t tell you how to do a DCE perfectly. Because of that dirty little secret…..there are an infinite number of solutions to a DCE. If you’ve not analysed 20+ DCEs are you really confident you know what solution to quote to policymakers? Particularly in health where the “tricks” available to solve the problem in other disciplines are not available? I’ll end with a quote:

Harry Callahan: “Uh uh. I know what you’re thinking. “Did he fire six shots or only five?” Well to tell you the truth in all this excitement I kinda lost track myself. But being this is a .44 Magnum, the most powerful handgun in the world and would blow your head clean off, you’ve gotta ask yourself one question: “Do I feel lucky?” Well, do ya, punk?

You are playing with a loaded gun if you don’t know enough about what solutions make sense and what will blow your reputation up. Do you feel lucky?

Best-Worst Scaling in Voting

My comment to one of the links posted to today’s “Water Cooler” posting at Naked Capitalism. (Cross posted to my company blog too). The original link concerned a proposal in the US state of Maine to introduce ranked voting rather than the first-past-the-post (FPTP) that is ubiquitous in the US and UK…..the proposal sounds attractive to people, but…..

“Ranking is a double edged sword – not that I condone the current first past the post (FPTP) system endemic in the US and UK (it’s the worst of all worlds) – but people should first look at what oddballs have ended up in the Federal Senate in Australia. Plus that awful Pauline Hanson may be about to make a comeback there.

Ranking has proven very very difficult to properly axiomatize – i.e. in practice, there are a whole load of assumptions that must hold for the typical “elimination from the bottom” (or any other vote aggregation method) to properly reflect the strength of preference in the population. For instance:
(1) Not everybody ranks in the same way (top-bottom / bottom-top / top, bottom, then middle, or any other of a huge number of methods);
(2) An individual can give you different rankings depending on how you ask him/her to provide you with answers (again, ask ranks 1, 2, 3, etc,…. 9, 8, 7, etc, 1, 9, 2, 8 etc ….)
(3) People have different degrees of certainty at different ranking depths – they are typically far less sure about their middle rankings than their top and bottom choices.

Unfortunately, where academic marketing, psychology and economics studies have been done properly, these kind of problems have proven to be endemic….furthermore they often matter to the final outcome, which is worrying. It’s why gods of the field of math psych (from Luce and Marley in the 1960s onwards) were very very cautious in condoning ranking as a method.

Statement of conflict of interest: Marley and I are co-authors on the definitive textbook on an alternative method called best-worst scaling….it asks people for their most and least preferred options only. The math is much easier and I’d be very very interested to see what would have happened in both the Rep/Dem primaries if it had been used – generally you subtract the number of “least preferred” votes from the number of “most preferred” – so people like Clinton and Trump with high negatives get into trouble….”

What I didn’t say (since the work is technical) is that Tony Marley has done a lot of work in voting and has published at least one paper extolling BWS as a method of voting.

 

 

what can’t be legitimately separated from how

I and co-authors have a chapter in the just published book “Care at the end of life” edited by Jeff Round. I haven’t had a chance to read most of it yet, but from I’ve seen so far it’s great.

Chris Sampson has a good chapter on the objects we value when examining the end-of-life trajectory. It’s nicely written and parts of it tie in with my series on “where next for discrete choice valuation”, parts one, (which he cites), two, three, but particularly (and too late for the book), four.

The issue concerns a better separation of what we are valuing from how we value it. I came at it from a slightly different angle from Chris, though I sense we’re trying to get people to address the same question. It’s of increasing importance now the ICECAP instruments are becoming more mainstream. I’m often thought of as “the valuation guy” – yet how we valued Capabilities is intimately tied up with how the measures might (or might not) be used, as well as the concepts behind them. When I became aware that the method we used – Case 2 BWS – would not necessarily have given us the same as those from discrete choice experiments, part of me worried…..briefly. But in truth, I honestly think our method is more in tune with the spirit of Sen’s ideas. (Not to mention the fact we seem to be getting similar estimates, though I explain why this is probably so in this instance previously).

I have said quite a bit already in the blogs, but it’s nice to see others also coming at this issue from other directions. Anybody working on developing the Capabilities Approach must remain in close contact with those who are working on valuation methods.