Tag Archives: population valuation

BWS neither friend nor foe

This post replies to some requests I have had asking me to respond to a paper concluding that DCEs are better than BWS for health state valuation. To be honest I am loathe to respond, for reasons that will become apparent.

First of all, let me clarify one thing that people might not appreciate – I most definitely do not want to “evangelise” for BWS and it is not the solution in quite a few circumstances. (See the papers coming out from the CHU-9D child health valuation study I was involved with for starters – BWS was effectively a waste of resources in the end….”best” choices were all we could use for the tariff.)

I only really pushed BWS strongly in my early days as a postdoc when I wanted to make a name for myself. If you read my papers since 2007 (*all* of them) you’ll see the numerous caveats appear with increasing frequency. And that’s before we even get to the BWS book, where we devote an entire chapter discussing unresolved issues including the REAL weaknesses and research areas for BWS (as opposed to straw men I have been seeing in recent literature).

OK now that’s out of the way, I will lay some other cards on the table, many of which are well-known since I’ve not exactly been quiet about them. I had mental health issues associated with my exit from academia. I’m back on my feet now doing private sector work for very appreciative clients, but that doesn’t mean I want to go back and fight old battles….battles which I erroneously thought us three book authors had “won” by passing muster with the top mathematical psychologists, economists and others in the world during peer review. When you publish a paper in the Journal of Mathematical Psychology (the JHE of that field) illustrating a key feature/potential weakness of a DCE (or specifically Case 2 BWS) back in 2008 you tend to expect that papers published in 2016 would not ignore this and would not do research that showed zero awareness of this issue and as a result made fundamental errors – after all, whilst we know clinical trials take a while to go from proposal to main publication, preference studies do NOT take 8+ years to go through this process. I co-ran a BWS study from conceptualisation to results presentation in 6 days when in Sydney. Go figure.

So that’s an example of my biggest frustration – the standards of literature review have often been appalling. Two or three of my papers (ironically including the JHE one, which includes a whopping error which I myself have repeatedly flagged up and which I corrected in my 2008 BMC paper) seem to get inserted as “the obligatory BWS reference to satisfy referees/editors” and in many cases bear no relation to the point being made by authors. Alarm bells immediately flash when I read an abstract via a citation alert and see those were my references. But it keeps happening. Not good practice, folks.

In fact (and at a recent meeting someone with no connection to me said the same thing) in certain areas of patient outcomes research the industry reviews are considered far better than academic ones – they have to be or get laughed out of court.

Anyway, I have been told that good practice eventually drives out bad. Sorry, if that’s true, the timescale was simply too long for me, which didn’t help my career in academia and raised my blood pressure.

Returning to the issue at hand. I’m not going to go through the paper in question, nor the several others that have appeared in the last couple of years purporting to show limitations of BWS. I have a company to run, caring obligations and I’ve written more than enough for anyone to join the dots here if they do a proper literature review. My final attempt to help out was an SSRN paper. But that’s it – without some give and take from the wider community, my most imaginative BWS work will be for clients who put food on the table and who pay – sometimes quite handsomely – for a method that when properly applied shows amazing predictive ability together with insights into how humans make decisions.

Now, of course, health state valuation is another kettle of fish – no revealed preference data etc. However, Tony, Jordan and I discussed why “context” is key in 2008 (JMP); I expounded on this with reference to QALYs in my two 2010 single authored papers, and published a (underpowered) comparison in the 2013 JoCM paper (which I first presented at the 2011 ICMC conference in Leeds, getting constructive criticism from the top choice modellers on Earth). So this issue is not particularly new.

It’s rather poor that nobody has actually used the right design to compare Case 2 BWS with DCEs for health state valuation…I ended up deciding “if you want something done properly you have to do it yourself” and I am very grateful to the EuroQoL Foundation for funding such a study, which I am currently analysing with collaborators. I don’t really “have a dog in this fight” and if Case 2 proves useful then great, and if not then at least I will know exactly why not…and the reasons will have nothing to do with the “BWS is bad m’kayyyyy” papers published recently. (To be fair, I am sometimes limited in what I can access, with no longer having an academic affiliation so full texts are sometimes unavailable, but when there’s NO mention of attribute importance in the abstract, NOR why efficient designs for Case 2 are problematic my Bayesian estimate is 99.99% probability the paper is fundamentally flawed and couldn’t possibly rule BWS in or out as a viable competitor to a DCE.)

If you’d like to know more:

  • Read the book
  • Read all the articles – my google scholar profile is up to date
  • Get up to speed on the issues in discrete choice design theory – fast. Efficient designs are in many many instances extremely good (and I’ve used them) but you need to know exactly why in a Case 2 context they are inappropriate.

If you still don’t understand, get your institution to contract me to run an exec education course. When I’m not working, I’m not earning, full stop.

I’m now far more pragmatic about the pros and cons of academia and really didn’t want to be the archetypal “I’m leaving social media now” whinger. And I’m not leaving. But I am re-prioritising things. Sorry if this sounds harsh/unhelpful – I didn’t want to write this post and hoped to quietly slip beneath the radar, popping up when I thought something insightful based on one of BWS’s REAL disadvantages or Sen’s work etc was mentioned. But people I respect have asked for guidance. So I am giving what I can, given 10 minutes free time I have.

Just trying to end on a positive note – I gave a great exec education course recently. It was a pleasure to engage with people who asked questions that were pertinent to the limitations of BWS and who just wanted to use the right tool for the right job. That’s what I try to do and what we should all aim for. I take my hat off to them all.

Where next for discrete choice health valuation – part one

Where next for discrete choice health valuation – part one

My final academic obligations concerned two projects involving valuation of quality of life/heath states. Interestingly, they involved people at opposite ends of the age spectrum – children and people at the end of life. (Incidentally I am glad that the projects happened to be ending at the time I exited academia, so I didn’t leave the project teams in the lurch!)

These projects have thrown up lots of interesting issues, as did my “first generation” of valuation studies (the ICECAP-O and –A quality of life/well-being instruments). This blog entry will be the first of three to summarise these issues and lay out some ideas for how to go forward with future valuation studies and, in particular, construction of new descriptive systems for health or quality of life. In time they will be edited and combined to form a working paper to be hosted on SSRN. The issue to be addressed in this first blog concerns the descriptive system – its size and how it can/cannot be valued.

The size of the descriptive system became pertinent when we valued the CHU-9D instrument for child health. More specifically, an issue that arose concerned the ability of some children to do Best-Worst Scaling tasks for the CHU-9D. The project found that we could only use the “best” (first best actually) data for reporting. This is not secret: I, and other members of the project team are reporting this at various conferences over the coming year. I may well be first, at the International Academy of Health Preference Research conference in St Louis, USA, in a few weeks. We knew from a pilot study that children exhibited much larger rates of inconsistency in their “worst” choices than their “best”: the plot of best vs worst frequencies had a bloody big part of the inverse relationship curve missing! (This was the first time I saw this.)

 

 

Best vs versus frequencies of the type in the child health study

Best vs versus frequencies of the type in the child health study

When you plot the best choice frequency against the worst choice frequency of each attribute level you should see an approximately inverse relationship. After all, an attractive attribute level should be chosen frequently as best and infrequently as worst; an unattractive attribute level should be chosen frequently as worst and infrequently as best. Yet in the child health study, the unattractive attribute levels (low levels of the 9 attributes), although showing the small “best” frequencies, did not show large “worst frequencies: they were all clustered together around a low worst frequency. This showed that the kids seemed to choose pretty randomly when it came to the “worst” choices – particularly bad attribute levels were NOT chosen more often as worst than moderately bad attribute levels. This made the part of the “inverse relationship” curve be missing! First time I’d seen that. It led us to made a big effort to get a lot of worst data (two rounds) and make it easy (by structuring the task with greying out of previously chosen options). However, it didn’t really work unfortunately.

I stress that despite my deliberately controversial title for the IAHPR conference, we CANNOT know if it was (1) the valuation method (BWS), (2) the descriptive system (CHU-9D) or (3) just plain respondent lack of knowledge that caused kids to be unable to decide what was worst about aspects of poor health.

(1) could be true if kids IN GENERAL don’t think about the bad things in life; (2) could be true if the number of attributes and levels was too large – the CHU-9D has 9 attributes, each with 5 levels, which is the largest instrument I have ever valued in a single exercise (I was involved in the ASCOT exercise which split the instrument in two); (3) could be true if kids can do “worst” tasks, but in general they just can’t comprehend poor health states (since kids from the general population are mostly highly unlikely to have experienced or even thought about them).

In the main study I hoped that “structured BWS” eliciting four of the nine ranks in a Case 2 BWS study would help the kids. More specifically:

(1) They answered best

(2) Their best option was then “greyed out” and they answered worst

(3) This was in turn greyed out and they answered next (second) best

(4) Which was in turn greyed out and they answered next (second) worst.

This in theory gave us four of the nine ranks (1,2,8,9). It was particularly useful because it enabled us to test the (often blindly made) assumption that the rank ordered logit model gives you utility function estimates that are “the same” no matter what ranking depth (top/bottom/etc) you use data from. Unfortunately our data failed this test quite spectacularly – only the first best data really gave sensible answers. So the pilot results were correct – for some reason in this study, kids’ worst choices were duff. (Even their second best data were not very good.)

Of course, as I mentioned, we don’t know the reason why this was the case, so we must proceed with caution before making controversial statements about how well BWS works among kids (ahem, cough, cough…)

But given the mooted idea to devise an “ICECAP for kids”, we should bear in mind the CHU-9D findings when constructing the descriptive system. I certainly don’t want to criticise the very comprehensive and well-conducted qualitative work done by Sheffield researchers to construct the CHU-9D. I merely pose some questions for future research to develop an “ICECAP for kids instrument” which may cause a tension between the needs of the descriptive system and the needs of the valuation exercise.

Would an ICECAP for kids really need 5^9=1953125 profiles (quality of life states) to describe child quality of life (as the CHU-9D did for health)?

My personal view is that too much of the health economics establishment may be thinking in terms of psychometrics, which (taking the SF-36 as the exemplar) typically concentrates on the number of items (questions/dimensions/attributes). A random utility theory based approach concentrates on the number of PROFILES (health/quality of life states). This causes the researcher to focus more on the combination of attributes and levels. When the system is multiplicative (as in a DCE), the number of “outcomes” becomes large VERY quickly.

Thus, some people are missing the point when they express concern at the small number of questions (five) in the ICECAP-O and –A. In fact there are 4^5 possible outcomes (states) – and moreover of the 1024 possible ICECAP states, over 200 ICECAP-O ones are observed in a typical British city. That makes the instrument potentially EXTREMELY sensitive. So I would end with a plea to think about the number of profiles (states) not the number of attributes. Can attributes be combined? That, and the statistics/intuition behind it, will be the subject of the second blog in this series.

 

Copyright Terry N Flynn 2015.

This, together with the accompanying blogs, will form a working paper to be submitted to SSRN. Please cite appropriately if referring to these issues in academic papers.

 

population vs patient values

Just had a very interesting discussion on the topic of the differences between population and patient values in valuation exercises.

I am not going to get into another argument surrounding whether we should continue using population values. Frankly I don’t care anymore what the policy-makers do and accusing me of being someone who advocates restricting the vote in an insulting manner to be close to an ad hominem attack – such a person wasn’t part of the group I was chatting with, I hasten to add!

However, the observation that experience radically changes preferences (for EQ-5D and ICECAP-O) is interesting for future work and health economists really need to decide how to reconcile this with individualised patient care at some point – at least in a manner that the population regards as acceptable (and I don’t think we have ever asked the population this).

Is it ethical that uninformed members of the public decide just how bad your disability is?

Cross posted from The Ethics Blog

Last time I raised the possibility of changing child health policy because teenagers are more likely than adults to view mental health impairments as being the worst type of disability. However, today I consider adults only in order to address a more fundamental issue.

Imagine you had an uncommon, but not rare, incurable disease that caused you to suffer from both “moderate” pain and “moderate” depression and neither had responded to existing treatments. If policy makers decided there were only enough funds to try to help one of these symptoms, who decides which should get priority?

In most of Europe, perhaps surprisingly, it would not be you the patient, nor even the wider patient group suffering from this condition. It is the general population. Why? The most often quoted reason will be familiar to those who know the history of the USA: “no taxation without representation”. Tax-payers supposedly fund most health care and their views should decide where this money is most needed. If they consider pain to be worse than depression, then health services should prioritise treatment for pain.

Thus, many European countries have conducted nationally representative surveys to quantify their general public’s views on various health states. Unfortunately Swedish population values were only published last year, almost two decades after the first European country published theirs. Although late, these Swedish population values raise a disturbing issue.

Suppose the general population is wrong?

Why might this be? Many people surveyed are, and always have been, basically healthy. How do they know whether depression is better or worse than pain? In fact, these people tend to say pain would be worse, whilst patients who have experienced both say the opposite.

The Swedish general population study was large and relatively well equipped to investigate how people in ill health value disability. And, indeed, they do value it differently than the average healthy Swedish person.

So is it ethical to disenfranchise patients in order that all citizens, informed or not, have a say?

Why not use the views of patients instead?

Well actually the stated policy in Sweden is that the health values ideally should come from the individuals affected by the health intervention (patients). So Sweden now has the information required to follow its own health policy aims. Perhaps it’s time politicians were asked if it is ethical to prioritise pain over mental health, just because various general populations thought this is so.

As a final thought, I return to the issue of “what funds healthcare”? You may be surprised to learn that the “general taxation” answer is wrong here too. But that strays beyond health care and ethics and into the dark heart of economics, which I will therefore discuss elsewhere next week!