Tag Archives: CHU9D

Where next for discrete choice health valuation – part one

Where next for discrete choice health valuation – part one

My final academic obligations concerned two projects involving valuation of quality of life/heath states. Interestingly, they involved people at opposite ends of the age spectrum – children and people at the end of life. (Incidentally I am glad that the projects happened to be ending at the time I exited academia, so I didn’t leave the project teams in the lurch!)

These projects have thrown up lots of interesting issues, as did my “first generation” of valuation studies (the ICECAP-O and –A quality of life/well-being instruments). This blog entry will be the first of three to summarise these issues and lay out some ideas for how to go forward with future valuation studies and, in particular, construction of new descriptive systems for health or quality of life. In time they will be edited and combined to form a working paper to be hosted on SSRN. The issue to be addressed in this first blog concerns the descriptive system – its size and how it can/cannot be valued.

The size of the descriptive system became pertinent when we valued the CHU-9D instrument for child health. More specifically, an issue that arose concerned the ability of some children to do Best-Worst Scaling tasks for the CHU-9D. The project found that we could only use the “best” (first best actually) data for reporting. This is not secret: I, and other members of the project team are reporting this at various conferences over the coming year. I may well be first, at the International Academy of Health Preference Research conference in St Louis, USA, in a few weeks. We knew from a pilot study that children exhibited much larger rates of inconsistency in their “worst” choices than their “best”: the plot of best vs worst frequencies had a bloody big part of the inverse relationship curve missing! (This was the first time I saw this.)

 

 

Best vs versus frequencies of the type in the child health study

Best vs versus frequencies of the type in the child health study

When you plot the best choice frequency against the worst choice frequency of each attribute level you should see an approximately inverse relationship. After all, an attractive attribute level should be chosen frequently as best and infrequently as worst; an unattractive attribute level should be chosen frequently as worst and infrequently as best. Yet in the child health study, the unattractive attribute levels (low levels of the 9 attributes), although showing the small “best” frequencies, did not show large “worst frequencies: they were all clustered together around a low worst frequency. This showed that the kids seemed to choose pretty randomly when it came to the “worst” choices – particularly bad attribute levels were NOT chosen more often as worst than moderately bad attribute levels. This made the part of the “inverse relationship” curve be missing! First time I’d seen that. It led us to made a big effort to get a lot of worst data (two rounds) and make it easy (by structuring the task with greying out of previously chosen options). However, it didn’t really work unfortunately.

I stress that despite my deliberately controversial title for the IAHPR conference, we CANNOT know if it was (1) the valuation method (BWS), (2) the descriptive system (CHU-9D) or (3) just plain respondent lack of knowledge that caused kids to be unable to decide what was worst about aspects of poor health.

(1) could be true if kids IN GENERAL don’t think about the bad things in life; (2) could be true if the number of attributes and levels was too large – the CHU-9D has 9 attributes, each with 5 levels, which is the largest instrument I have ever valued in a single exercise (I was involved in the ASCOT exercise which split the instrument in two); (3) could be true if kids can do “worst” tasks, but in general they just can’t comprehend poor health states (since kids from the general population are mostly highly unlikely to have experienced or even thought about them).

In the main study I hoped that “structured BWS” eliciting four of the nine ranks in a Case 2 BWS study would help the kids. More specifically:

(1) They answered best

(2) Their best option was then “greyed out” and they answered worst

(3) This was in turn greyed out and they answered next (second) best

(4) Which was in turn greyed out and they answered next (second) worst.

This in theory gave us four of the nine ranks (1,2,8,9). It was particularly useful because it enabled us to test the (often blindly made) assumption that the rank ordered logit model gives you utility function estimates that are “the same” no matter what ranking depth (top/bottom/etc) you use data from. Unfortunately our data failed this test quite spectacularly – only the first best data really gave sensible answers. So the pilot results were correct – for some reason in this study, kids’ worst choices were duff. (Even their second best data were not very good.)

Of course, as I mentioned, we don’t know the reason why this was the case, so we must proceed with caution before making controversial statements about how well BWS works among kids (ahem, cough, cough…)

But given the mooted idea to devise an “ICECAP for kids”, we should bear in mind the CHU-9D findings when constructing the descriptive system. I certainly don’t want to criticise the very comprehensive and well-conducted qualitative work done by Sheffield researchers to construct the CHU-9D. I merely pose some questions for future research to develop an “ICECAP for kids instrument” which may cause a tension between the needs of the descriptive system and the needs of the valuation exercise.

Would an ICECAP for kids really need 5^9=1953125 profiles (quality of life states) to describe child quality of life (as the CHU-9D did for health)?

My personal view is that too much of the health economics establishment may be thinking in terms of psychometrics, which (taking the SF-36 as the exemplar) typically concentrates on the number of items (questions/dimensions/attributes). A random utility theory based approach concentrates on the number of PROFILES (health/quality of life states). This causes the researcher to focus more on the combination of attributes and levels. When the system is multiplicative (as in a DCE), the number of “outcomes” becomes large VERY quickly.

Thus, some people are missing the point when they express concern at the small number of questions (five) in the ICECAP-O and –A. In fact there are 4^5 possible outcomes (states) – and moreover of the 1024 possible ICECAP states, over 200 ICECAP-O ones are observed in a typical British city. That makes the instrument potentially EXTREMELY sensitive. So I would end with a plea to think about the number of profiles (states) not the number of attributes. Can attributes be combined? That, and the statistics/intuition behind it, will be the subject of the second blog in this series.

 

Copyright Terry N Flynn 2015.

This, together with the accompanying blogs, will form a working paper to be submitted to SSRN. Please cite appropriately if referring to these issues in academic papers.

 

Moody teenagers? Giving them a greater say in health policy might solve this

Cross posted from The Ethics Blog

We have all heard of moody teenagers. Maybe we have them, or can remember being one. Recent research with my Australian colleagues suggests they may genuinely have more difficulty living with poor mental health than adults do.

Specifically, compared to the general public aged 18+, they are more likely to view mental health related impairments as being worse than physical disabilities.

This is not just an academic curiosity – if true, it means society is probably under-investing in child mental health. To explain why, we must first understand how most European countries decide on health funding priorities.

In general, disabilities with the greatest capacity to benefit from treatment are prioritised. To find out whether pain, depression, or some other, physical, impairment to health is worst – and therefore has the greatest potential benefit from treatment – nations conduct large population-based surveys. These require adults to make choices between lots of possible impaired health states in order to find out just how bad these are, relative to each other.

Of course, people often disagree on what is worst, and by how much, so decisions must be made as to whose values matter most. European nations generally agree that it is unethical to allow the rich to dictate what disabilities are most deserving of resources. Instead of “one € one vote”, it is “one person one vote”: taking a simple average of every adult’s values does this naturally.

Whilst this sounds fair and democratic in terms of process, it could be leading to uncomfortable outcomes for our moody teenager. Why? Well, if poor mental health is genuinely worse for teenagers than adults believe it to be then mental health interventions might not get funded: for example, if adults think pain is much worse, pain medications will be prioritised instead. This is because only adults are being asked for their health values, not teenagers.

So perhaps adults just don’t remember what it’s like to be young and we should use the teenagers’ values for health interventions that affect them?

Maybe not. There is a saying “age brings wisdom” and perhaps adults’ greater experience of illness means their values for mental health impairments are the correct ones. Maybe younger people have simply not experienced enough in life to know what aspects of illness are really worst. After all, immaturity is one reason why younger teenagers are not allowed to vote.

The ethical issues surrounding at what age teenagers can have sex, vote and make independent decisions in public life all become relevant here. However, “one person one vote” has one more disturbing implication that is relevant for people of all ages. By taking an average of everyone’s views, national health state value surveys include lots of healthy people who have no idea what it is like to live with severe illness. Does this matter? Well, it turns out that to the depressed patient in desperate need of a new anti-depressant it probably does.

Patients and the general public tend to disagree on which is worst – extreme pain or extreme depression. The general public gets the final say and my next blog entry will discuss how and why we might use the health values of patients themselves in priority setting instead.