Tag Archives: icecap-o

protocol updates for all instruments?

On Monday I put forward an argument that it will soon be time to update protocols and conduct new valuation exercises for older instruments like ICECAP-O (though I’d include the valuation exercise I was part of for ASCOT too in this recommendation, since it drew heavily on the ICECAP-O methods and the finding that the BWS tariff more-or-less matches the DCE one could conceal important differences our sample was not set up to detect). Yesterday I gave a purely personal view on the relative merits of ICECAP-O and ICECAP-A, arguing that continued use of a population average tariff might be an argument in favour of ICECAP-O, whilst more individual-level valuation might dictate whatever instrument is most appropriate for your age group.

Today’s blog entry will discuss a problem people may not be aware of, but which concerns the use of the original British English ICECAP-A in contexts where, in fact, it may give misleading results (though that remains to be checked, once it is translated from British English to other forms of English – bear with me!)

For instance, we know already that ICECAP-A – the instrument for use among adults of any age and which uses British English – should only be used with caution even in other predominantly English-speaking countries. Here’s why. After I was given the finalised version of ICECAP-A, my team in Sydney ran some piloting of the choice experiment (BWS). On at least one attribute the “third” (one level down from top) level capability score was actually estimated to be larger than the “fourth” (top) level score. Now, there are design reasons why this could have happened (which I won’t discuss here – anyone with sufficient knowledge of DCE design should be able to work out why this can happen). However, I was able to discount this as the main reason. It got me very worried. I asked around the office – most of my colleagues spoke American or Australian English. I was also able to ask a few NZ and Canadian English speakers.

I discovered that millennials up to my generation (gen X) in particular, in Australia, Canada and New Zealand, have largely imported US English definitions of the qualifier “quite”: they regard “quite a lot” of something to be of greater magnitude than “a lot” of something, unlike Brits who think the other way and which is an assumption in the wording of ICECAP-A (which used different types of qualifiers than ICECAP-O – see yesterday’s discussion). It turns out this is a well-known problem.

During final estimation I had to put in restrictions on the scoring in at least one attribute so the “top” level did not have a lower capability score than the “third” level in order for ICECAP-A to work: clearly even some Brits in the (UK) valuation exercise had abandoned traditional British English (watching US films and TV?), certainly enough to skew the scoring. So, sobering though it is, we also need to do some more work on ICECAP-A, in addition to ICECAP-O and ASCOT.

US/Canadian/NZ/Australian valuation exercises will have to “translate” the British English ICECAP-A version into their local English before valuation. I don’t think we need be defensive about this – the EuroQoL Group have changed their protocols/been open to more than one (the original, the Paris etc) over the years and are currently funding a lot of work to make a bigger leap forward. (Full disclosure:  I am part of a group funded by them to investigate whether BWS can be used to produce an EQ-5D-5L tariff.) A health economist’s job is never done!


Does age bring wisdom?

Today’s blog entry will discuss a philosophical issue that has implications for choice of ICECAP instrument and these views are purely my own, not those of any other individual involved in development of any of the ICECAP instruments. The issue concerns the difference in attributes (dimensions/domains) between ICECAP-O (technically designed only for older people) and ICECAP-A (designed for adults of any age).

Now, in many ways the issue is moot: there is considerable overlap in the underlying conceptual attributes and three of the five are pretty much the same. Whilst published work showed ICECAP-O working well amongst older people in a large survey conducted in Bristol, UK, unpublished follow-on work showed it working equally well among the younger adults. Furthermore a national online valuation exercise plus survey conducted in Australia (published in the BWS book) showed it working well there among adults of all ages too.

However, the other two attributes (doing things that make you feel valued and concerns about the future in ICECAP-O vs achievement & progress and feeling settled & secure in ICECAP-A) are arguably different. It might reflect the priorities of different generations: younger generations may feel a great need to achieve and progress – the idea of “moving forward” may be driving this (particularly since we have many more people working in that group). ICECAP-O on the other hand stresses the act of doing things that make you feel valued in life, which (to me) does not necessarily imply “moving forward” (though my personal career changes may have coloured my views!).

Likewise in ICECAP-A feeling settled and secure may reflect current younger generations’ feelings of instability in a world of zero-hour contracts etc. ICECAP-O asks instead about “concerns about the future”. Whilst it might be seen merely as the ICECAP-A question “flipped”, it is phrased with respect to the amount of concern overall, unlike ICECAP-A which is phrased with respect to how many areas of life – there’s a subtle difference there. To illustrate, I will simply pose a question. If you otherwise have a very good quality of life, can you still have a lot of concern about the future? I’d argue yes. Now let’s think about ICECAP-A. If you otherwise have a very good quality of life, can you feel settled and secure in only a few areas of life? Playing devil’s advocate, it could be argued that “this respondent has already said they’re doing well on the other four attributes of quality of life – they have a lot of capability to achieve the levels they want – so how can they feel unsettled in key attributes too?”

Ultimately these are empirical issues, requiring researchers to look at correlation matrices of actual answers. In the Australian survey 5002 people were randomised to either ICECAP-O or ICECAP-A. The Spearman rank correlation coefficients of the respondents’ five tickbox answers were uniformly higher for any given pair of attributes in ICECAP-A compared to their ICECAP-O equivalent. However, a big caveat here is that the ICECAP-O arm was a properly done valuation exercise in which quota sampling – on the basis of own ICECAP-O tickbox answers – was done. There were no previous ICECAP-A data on which to choose quotas. Thus this is not a like-for-like comparison and ICECAP-O therefore had an artificial advantage. Using the Bristol adult ICECAP-O data (to correct, somewhat, for this) caused four of the ten pairwise correlations to be smaller for ICECAP-A but two of these were for attributes common to both instruments. Comparisons among groups from the same country and using the same sampling is therefore required before firm conclusions can be made.

Finally, it is worth considering the philosophy here and I’ll raise a final point. OK it seems that adding younger adults to the valuation sample has changed at least one and arguably two attributes. It raises the normative question of whether we should use these attributes in valuing their quality of life when they haven’t, by definition, lived a long life: perhaps age brings wisdom and it is the older people who “know what’s best for you”. Most people experience regret at some point and our “values” (defined both conceptually – the attributes themselves – and numerically – the tariff) can change with experience.

Of course using a single ICECAP instrument – ICECAP-O if one were persuaded of the above philosophical argument – would make things nicer and easier when it comes to “a single common denominator for valuation” but if, like me, you are keen for greater investigation of (and possibly use of) individual valuation, could we justify using ICECAP-O scoring for a 30 year old which may downweight “doing things that make you feel valued” because that person actually is more interested in achieving things and progressing (forward?) in life?  On the other hand, knowing what are the key conceptual attributes of ICECAP-A, maybe stressing, in the intro for ICECAP-O,  that “doing things that make you feel valued” can easily encompass “achieving and progressing in life” is a practical solution?

Another empirical issue!

And so we come full circle to whether practical solutions, or stricter ones fitting some theory, are the way forward. As usual in health economics, normative issues galore.

Perfect the enemy of the good?

I recently got into a discussion on twitter about the properties of the ICECAP instruments and what the zero on these mean. One particular point I made was that us “saying” the state of “no capabilities” must be zero didn’t necessarily make it so, at least in the eyes of a mathematical psychologist. They’d probably say it is not a ratio scale and might not even have good interval scale properties at the aggregate level (if there’s improperly-adjusted-for underlying heterogeneity).

I’m not too worried about these points, though I personally think better subgroup/heterogeneity analyses need to be done in future to address the latter point. But it did lead me to think about that old recommendation “don’t let the perfect become the enemy of the good”. This potentially gets a lot of extra-welfarism into hot water, where the maths psych people are concerned: instruments and valuation tweaks or even in some cases the whole valuation method (VAS and arguably TTO) have little in the way of theory (that that group would recognise) behind them. However, I remember one health economist summarising a discussion he had with clinicians and members of the public as to how scarce resources should be allocated in health care and they “naturally” came up with something that approximated a QALY with TTO scoring. This is fair enough and I am happy with the newer theories/concepts put forward to justify what health economists do in that particular area. After all, extra-welfarism doesn’t have the same assumptions and theories as traditional welfarist economics so why get bothered about what another discipline entirely thinks?

I guess I’m just naturally – having worked so long with a maths psych guru – very particular about getting scoring “right”, as in it satisfying one or more of the properties inherent in proper scales (absolute/ratio/interval/difference). So yeah I guess I may be guilty of being dissatisfied with just “good”…but in my defence, we are producing tariffs (sets of scores) here that are being increasingly used across the world – the Netherlands has already decided on a dual “QALY+” approach: more than one evaluative space seems to finally be accepted, to aid decision-making. We shouldn’t stand still, particularly as we know a lot more about the properties of the Best-Worst Scaling valuation technique now than we did back in the original UK ICECAP-O valuation exercise. Whilst it is gratifying that public interest and funders have agreed with us that areas like end-of-life care and (potentially) children need ICECAP instruments, we should not rest on our laurels with existing instruments.

no capability not death

Just a quick note following a twitter exchange I had regarding whether capabilities as valued by the ICEPOP team (the ICECAP-O was referenced in the original paper) are “QALY-like”.

Key team members never intended the ICECAP-O scores to be multiplied by life expectancy (in the way, say, an EQ-5D score is). Whilst we have recognised that people would like to do this, technically this is a fudge and comes down to definitions and the maths:

Death necessarily implies no capabilities but no capabilities (the bottom ICECAP-O state) does not imply death. But more fundamentally, the estimated ICECAP scores are interval scaled, NOT ratio-scaled (for reference, read the BWS book): we used a linear transformation to preserve the relative differences between states but the anchoring at zero would not be accepted by a math psych person: they would say defining the bottom to be the zero doesn’t make it so.

Since different individuals technically had different zeros (the BWS or any discrete choice estimates have arbitrary zero) – death – multiplying a technically averaged interval scale score (our published tariff) by a ratio scaled one (life expectancy) to compare across groups/interventions is wrong: If there is heterogeneity in where “death” is on our latent capability scale (which we can’t/didn’t quantify – unlike the traditional QALY models estimated in the proper way) then comparisons across groups that don’t have the same “zero” gives incorrect answers. We can compare “mean losses of capability from full capability” which is why I personally (though I don’t speak for the wider team here) prefer the measure to be used as an alternative measure of deprivation, like the IMD in UK or SEIFA in Australia.

older aussie well-being

A tweet just alerted me to this report on the well-being of older Australians. I haven’t had time to read in detail but a quick skim seems to indicate it did all the “usual” things of checking correlations, doing PCA, etc and then “The final index was then calculated by averaging the five domain index_log”.


I cannot but help feel a little frustrated. I gave a talk on this subject 6 years ago in Sydney when working at UTS. Many of my top publications from my time in academia concerned the development and Australian usage of the ICECAP-O instrument (see chapter 12) as a measure of the well-being of (primarily but not just) older people. Advantages it has over the research in the report I’ve just read are the following:

  1. ICECAP-O doesn’t use variables that must be collected from official sources or be part of (say) the SEIFA. The five variables came from extensive qualitative work that established what it is about (say) housing uncertainty that really contributes to/takes away from well-being. We wanted the underlying key conceptual attributes of well-being. So whilst health (for instance) is valued, it is what it gives you in terms of independence, security, enjoyment of activities that really matters.
  2. ICECAP-O is an individual-level one A4 page questionnaire. Four response categories per question means 4^5=1024 distinct “states” you could be in, each with its own percentage score. So slice and dice the data in far more flexible, disaggregated ways than what’s out there so far.
  3. The five domains are NOT simply averaged, nor do the response categories across domains be equally valued – e.g. the 2nd to top level of “love and friendship” is more highly valued, on average, than the top level of ANY of the other four other domains. They don’t all matter equally to older people. There is even a fair degree of heterogeneity AMONG older people as to the relative importance of these, heavily driven by factors such as marital status and gender. We used choice models to find this out and the findings are based on robust well-tested theoretical models.
  4. You can compare with (say) the SEIFA – we did this with the UK Index of Multiple Deprivation in one of my papers looking a British city – and get far better insights. So for instance, measures like the IMD/SEIFA can be misleading when they fail to capture measures of social capital or connectedness.

It’s a shame when disciplines don’t talk to one another. Things could move forward a lot more quickly. And as long as we use SEIFA/IMD type measures in policy, we’re going to be directing resources to the wrong people.

How impairments in well-being affect preferences for well-being

Am almost finished proofing my chapters of the BWS book.

The chapter on quality of life in Australia reminded me we did actually present results showing the associations between impairments in quality of life and preferences for quality of life. For instance, do lonely people really desire independence (to compensate) or do they really prefer to regain their social life? Can loss of independence (physical health) be compensated for by ensuring good social contact and feeling valued in whatever you do in life?

I had forgotten that we had results on these!

But you have to buy the book to learn the results – CUP will publish it in September (or of any co-authors are slow on proofing), October


Happiness isn’t quality of life if you’re old

The subject of happiness, particularly among older people, has come up (again) in the media. I reckon they trot out the latest survey results whenever there’s a slow news day. I think it’s no coincidence the newest stories have appeared in the slow month of August.

Anyway I shall keep this short as I’ll rant otherwise. Once again, neither happiness nor life satisfaction is the same as quality of life and we can argue til the cows come home as to which of the three (if any) is truly well-being.

First of all, if I can find the time to write up a follow-up to the paper I published on the mid 2000s survey of Bristolians I will show this:

Five year age bands showing mean levels (after rescaling) of self-rated happiness versus scored quality of life in Bristol

Five year age bands showing mean levels (after rescaling) of self-rated happiness versus scored quality of life in Bristol

The two track reasonably closely until retirement age. Then whilst happiness continues to rise, quality of life certainly does not. The wealth of other evidence on health, money, friends, etc from the survey suggests our QoL, the ICECAP-O instrument, is the better measure of overall well-being.

We are not the only ones to find this. A large US study pretty much concluded they didn’t know WTF older people were doing when they answered life satisfaction/happiness questions but they sure don’t answer them the same way that younger adults do. Older people use a different part of the numerical scale (typically a higher portion, all other things being equal). That’s rating scale bias and there is a huge and growing literature on it.

Stop asking these dumb questions. There are good alternatives.



CADR conference and psychometrics

My presentation yesterday at the Centre for Applied Disability Research conference seemed to go down well. There was one comment that I get on a frequent basis so I thought I’d give a more complete answer here. The question is always a variant on the following:

“Why should we use ICECAP-O or one of those instruments you’re touting when there’s the WHOQoL instrument or instrument x/y/z that has been validated already?”

A tag-on is often to the effect “how can your 5 item instrument beat our 10/15/50 item instrument which is bound to be better for individuals?”

Well the answer is simple – it comes down to a difference in paradigm, in particular the difference between psychometrics and random utility theory. I could be rude at conferences (but don’t) to counter the (occasionally a tad aggressive) attacks I get on ICECAP-O for “not being individual-specific”. My “slightly aggressive” response would be “actually it’s YOUR instrument that isn’t individual specific – psychometrics isn’t about the individual, it uses differences BETWEEN individuals to validate the response categories. Random Utility Theory (RUT) is EXPLICITLY a theory of how the individual makes choices and as such, any instrument based on it is by definition an individual level QoL instrument! For ICECAP-O (or any of the other instruments in the ICECAP family or the CES) I could, in theory, give any respondent THEIR OWN set of scores (a “tariff” to use health economics parlance), if they do the choice experiment. You CANNOT do that with existing instruments, with the exception of some health-based ones that use the time trade-off/standard gamble, IF they’d asked the right set of questions to concentrate on individual level scores.

This individual respondent tariff reflects the trade-offs THAT INDIVIDUAL would make between the items and how bad the various impairments are to that person. You can’t get that from any of these instruments I hear touted as being “superior” since they were validated on the basis of between person differences – by definition they cannot be tested/validated at the individual level. (Not least because there are no scores at that level, certainly not preference based ones that reflect how bad the impairments all are on a common scale).

So ICECAP-O and the other instruments beat them all when it comes to the issue of “the individual level”. We can feed back individual level scores – and indeed we did, for the end-of-life care survey, which you too can do if you click on surveys and go back a page or two. So not only are there 4 to the power 5 (1024) distinct utility values available – it is the PROFILE defined by the set of 5 answers that matters, not the number of questions – but these 1024 scores could be individual specific if we wanted. Indeed Chapter 12 of the forthcoming best-worst scaling (BWS) book (Louviere, Flynn & Marley – Best-Worst Scaling: Theory & Applications, CUP) will present subgroup Australian tariffs for ICECAP-O.

“Testing” ICECAP-O using psychometric based techniques may be invalid – I’m not sure – but one thing I am sure of. Stop throwing mud at us for having an instrument that is “obviously worse” than these existing large-item questionnaires – because I KNOW for a fact you’ve not tested the individual level properties of the scoring of these. At best we can all agree a truce and say there are two differing paradigms in use here and at present there’s been no properly designed study that uses a common denominator on which I could compare them.

health plus social care equals what

I’ve commented on a piece in the Guardian about the integration of health and social care in the UK and the options for individuals to have personal budgets.

I mentioned both the OSCA and ICECAP instruments – I wonder if all the millions of pounds of public money that went into developing these will actually come to fruition?

I hope those stupid happiness scores are not used – they’ve been debunked several times in several countries now, but certain arms of the British establishment seems dead set on them.

Aussies take note – there is a conference I am speaking at on this very subject in Sydney in late May.

Best-Worst Scaling Book almost finished

Tony Marley will arrive in Sydney this weekend. He, jordan and I will finalise the BWS book for CUP during the two weeks Tony is here – sweet!

There will be empirical chapter(s) on using the best-minus-worst scores ( the ‘scores’) in analysis together with a chapter publishing the complete set of Australian socio-demographic related tariffs for ICECAP-O and (if collaborators permit it) the Canadian population tariff for the same instrument.