Tag Archives: kids

Perfect the enemy of the good?

I recently got into a discussion on twitter about the properties of the ICECAP instruments and what the zero on these mean. One particular point I made was that us “saying” the state of “no capabilities” must be zero didn’t necessarily make it so, at least in the eyes of a mathematical psychologist. They’d probably say it is not a ratio scale and might not even have good interval scale properties at the aggregate level (if there’s improperly-adjusted-for underlying heterogeneity).

I’m not too worried about these points, though I personally think better subgroup/heterogeneity analyses need to be done in future to address the latter point. But it did lead me to think about that old recommendation “don’t let the perfect become the enemy of the good”. This potentially gets a lot of extra-welfarism into hot water, where the maths psych people are concerned: instruments and valuation tweaks or even in some cases the whole valuation method (VAS and arguably TTO) have little in the way of theory (that that group would recognise) behind them. However, I remember one health economist summarising a discussion he had with clinicians and members of the public as to how scarce resources should be allocated in health care and they “naturally” came up with something that approximated a QALY with TTO scoring. This is fair enough and I am happy with the newer theories/concepts put forward to justify what health economists do in that particular area. After all, extra-welfarism doesn’t have the same assumptions and theories as traditional welfarist economics so why get bothered about what another discipline entirely thinks?

I guess I’m just naturally – having worked so long with a maths psych guru – very particular about getting scoring “right”, as in it satisfying one or more of the properties inherent in proper scales (absolute/ratio/interval/difference). So yeah I guess I may be guilty of being dissatisfied with just “good”…but in my defence, we are producing tariffs (sets of scores) here that are being increasingly used across the world – the Netherlands has already decided on a dual “QALY+” approach: more than one evaluative space seems to finally be accepted, to aid decision-making. We shouldn’t stand still, particularly as we know a lot more about the properties of the Best-Worst Scaling valuation technique now than we did back in the original UK ICECAP-O valuation exercise. Whilst it is gratifying that public interest and funders have agreed with us that areas like end-of-life care and (potentially) children need ICECAP instruments, we should not rest on our laurels with existing instruments.

Where next for discrete choice health valuation – part one

Where next for discrete choice health valuation – part one

My final academic obligations concerned two projects involving valuation of quality of life/heath states. Interestingly, they involved people at opposite ends of the age spectrum – children and people at the end of life. (Incidentally I am glad that the projects happened to be ending at the time I exited academia, so I didn’t leave the project teams in the lurch!)

These projects have thrown up lots of interesting issues, as did my “first generation” of valuation studies (the ICECAP-O and –A quality of life/well-being instruments). This blog entry will be the first of three to summarise these issues and lay out some ideas for how to go forward with future valuation studies and, in particular, construction of new descriptive systems for health or quality of life. In time they will be edited and combined to form a working paper to be hosted on SSRN. The issue to be addressed in this first blog concerns the descriptive system – its size and how it can/cannot be valued.

The size of the descriptive system became pertinent when we valued the CHU-9D instrument for child health. More specifically, an issue that arose concerned the ability of some children to do Best-Worst Scaling tasks for the CHU-9D. The project found that we could only use the “best” (first best actually) data for reporting. This is not secret: I, and other members of the project team are reporting this at various conferences over the coming year. I may well be first, at the International Academy of Health Preference Research conference in St Louis, USA, in a few weeks. We knew from a pilot study that children exhibited much larger rates of inconsistency in their “worst” choices than their “best”: the plot of best vs worst frequencies had a bloody big part of the inverse relationship curve missing! (This was the first time I saw this.)

 

 

Best vs versus frequencies of the type in the child health study

Best vs versus frequencies of the type in the child health study

When you plot the best choice frequency against the worst choice frequency of each attribute level you should see an approximately inverse relationship. After all, an attractive attribute level should be chosen frequently as best and infrequently as worst; an unattractive attribute level should be chosen frequently as worst and infrequently as best. Yet in the child health study, the unattractive attribute levels (low levels of the 9 attributes), although showing the small “best” frequencies, did not show large “worst frequencies: they were all clustered together around a low worst frequency. This showed that the kids seemed to choose pretty randomly when it came to the “worst” choices – particularly bad attribute levels were NOT chosen more often as worst than moderately bad attribute levels. This made the part of the “inverse relationship” curve be missing! First time I’d seen that. It led us to made a big effort to get a lot of worst data (two rounds) and make it easy (by structuring the task with greying out of previously chosen options). However, it didn’t really work unfortunately.

I stress that despite my deliberately controversial title for the IAHPR conference, we CANNOT know if it was (1) the valuation method (BWS), (2) the descriptive system (CHU-9D) or (3) just plain respondent lack of knowledge that caused kids to be unable to decide what was worst about aspects of poor health.

(1) could be true if kids IN GENERAL don’t think about the bad things in life; (2) could be true if the number of attributes and levels was too large – the CHU-9D has 9 attributes, each with 5 levels, which is the largest instrument I have ever valued in a single exercise (I was involved in the ASCOT exercise which split the instrument in two); (3) could be true if kids can do “worst” tasks, but in general they just can’t comprehend poor health states (since kids from the general population are mostly highly unlikely to have experienced or even thought about them).

In the main study I hoped that “structured BWS” eliciting four of the nine ranks in a Case 2 BWS study would help the kids. More specifically:

(1) They answered best

(2) Their best option was then “greyed out” and they answered worst

(3) This was in turn greyed out and they answered next (second) best

(4) Which was in turn greyed out and they answered next (second) worst.

This in theory gave us four of the nine ranks (1,2,8,9). It was particularly useful because it enabled us to test the (often blindly made) assumption that the rank ordered logit model gives you utility function estimates that are “the same” no matter what ranking depth (top/bottom/etc) you use data from. Unfortunately our data failed this test quite spectacularly – only the first best data really gave sensible answers. So the pilot results were correct – for some reason in this study, kids’ worst choices were duff. (Even their second best data were not very good.)

Of course, as I mentioned, we don’t know the reason why this was the case, so we must proceed with caution before making controversial statements about how well BWS works among kids (ahem, cough, cough…)

But given the mooted idea to devise an “ICECAP for kids”, we should bear in mind the CHU-9D findings when constructing the descriptive system. I certainly don’t want to criticise the very comprehensive and well-conducted qualitative work done by Sheffield researchers to construct the CHU-9D. I merely pose some questions for future research to develop an “ICECAP for kids instrument” which may cause a tension between the needs of the descriptive system and the needs of the valuation exercise.

Would an ICECAP for kids really need 5^9=1953125 profiles (quality of life states) to describe child quality of life (as the CHU-9D did for health)?

My personal view is that too much of the health economics establishment may be thinking in terms of psychometrics, which (taking the SF-36 as the exemplar) typically concentrates on the number of items (questions/dimensions/attributes). A random utility theory based approach concentrates on the number of PROFILES (health/quality of life states). This causes the researcher to focus more on the combination of attributes and levels. When the system is multiplicative (as in a DCE), the number of “outcomes” becomes large VERY quickly.

Thus, some people are missing the point when they express concern at the small number of questions (five) in the ICECAP-O and –A. In fact there are 4^5 possible outcomes (states) – and moreover of the 1024 possible ICECAP states, over 200 ICECAP-O ones are observed in a typical British city. That makes the instrument potentially EXTREMELY sensitive. So I would end with a plea to think about the number of profiles (states) not the number of attributes. Can attributes be combined? That, and the statistics/intuition behind it, will be the subject of the second blog in this series.

 

Copyright Terry N Flynn 2015.

This, together with the accompanying blogs, will form a working paper to be submitted to SSRN. Please cite appropriately if referring to these issues in academic papers.