Tag Archives: ICECAP

protocol updates for all instruments?

On Monday I put forward an argument that it will soon be time to update protocols and conduct new valuation exercises for older instruments like ICECAP-O (though I’d include the valuation exercise I was part of for ASCOT too in this recommendation, since it drew heavily on the ICECAP-O methods and the finding that the BWS tariff more-or-less matches the DCE one could conceal important differences our sample was not set up to detect). Yesterday I gave a purely personal view on the relative merits of ICECAP-O and ICECAP-A, arguing that continued use of a population average tariff might be an argument in favour of ICECAP-O, whilst more individual-level valuation might dictate whatever instrument is most appropriate for your age group.

Today’s blog entry will discuss a problem people may not be aware of, but which concerns the use of the original British English ICECAP-A in contexts where, in fact, it may give misleading results (though that remains to be checked, once it is translated from British English to other forms of English – bear with me!)

For instance, we know already that ICECAP-A – the instrument for use among adults of any age and which uses British English – should only be used with caution even in other predominantly English-speaking countries. Here’s why. After I was given the finalised version of ICECAP-A, my team in Sydney ran some piloting of the choice experiment (BWS). On at least one attribute the “third” (one level down from top) level capability score was actually estimated to be larger than the “fourth” (top) level score. Now, there are design reasons why this could have happened (which I won’t discuss here – anyone with sufficient knowledge of DCE design should be able to work out why this can happen). However, I was able to discount this as the main reason. It got me very worried. I asked around the office – most of my colleagues spoke American or Australian English. I was also able to ask a few NZ and Canadian English speakers.

I discovered that millennials up to my generation (gen X) in particular, in Australia, Canada and New Zealand, have largely imported US English definitions of the qualifier “quite”: they regard “quite a lot” of something to be of greater magnitude than “a lot” of something, unlike Brits who think the other way and which is an assumption in the wording of ICECAP-A (which used different types of qualifiers than ICECAP-O – see yesterday’s discussion). It turns out this is a well-known problem.

During final estimation I had to put in restrictions on the scoring in at least one attribute so the “top” level did not have a lower capability score than the “third” level in order for ICECAP-A to work: clearly even some Brits in the (UK) valuation exercise had abandoned traditional British English (watching US films and TV?), certainly enough to skew the scoring. So, sobering though it is, we also need to do some more work on ICECAP-A, in addition to ICECAP-O and ASCOT.

US/Canadian/NZ/Australian valuation exercises will have to “translate” the British English ICECAP-A version into their local English before valuation. I don’t think we need be defensive about this – the EuroQoL Group have changed their protocols/been open to more than one (the original, the Paris etc) over the years and are currently funding a lot of work to make a bigger leap forward. (Full disclosure:  I am part of a group funded by them to investigate whether BWS can be used to produce an EQ-5D-5L tariff.) A health economist’s job is never done!


Happiness isn’t quality of life if you’re old

The subject of happiness, particularly among older people, has come up (again) in the media. I reckon they trot out the latest survey results whenever there’s a slow news day. I think it’s no coincidence the newest stories have appeared in the slow month of August.

Anyway I shall keep this short as I’ll rant otherwise. Once again, neither happiness nor life satisfaction is the same as quality of life and we can argue til the cows come home as to which of the three (if any) is truly well-being.

First of all, if I can find the time to write up a follow-up to the paper I published on the mid 2000s survey of Bristolians I will show this:

Five year age bands showing mean levels (after rescaling) of self-rated happiness versus scored quality of life in Bristol

Five year age bands showing mean levels (after rescaling) of self-rated happiness versus scored quality of life in Bristol

The two track reasonably closely until retirement age. Then whilst happiness continues to rise, quality of life certainly does not. The wealth of other evidence on health, money, friends, etc from the survey suggests our QoL, the ICECAP-O instrument, is the better measure of overall well-being.

We are not the only ones to find this. A large US study pretty much concluded they didn’t know WTF older people were doing when they answered life satisfaction/happiness questions but they sure don’t answer them the same way that younger adults do. Older people use a different part of the numerical scale (typically a higher portion, all other things being equal). That’s rating scale bias and there is a huge and growing literature on it.

Stop asking these dumb questions. There are good alternatives.



my top 5 dce papers update

Just to let readers know – I have decided not to go ahead with the “my top 5 DCE papers the health economists should be reading” paper after one round of review from the journal*.

A (currently not peer reviewed) paper by Richard Norman and a colleague analysing google scholar profiles for self-citation suggested the phenomenon is more common in Australasia. Apparently there is a perception out there that (current? former?) I4C members may be driving this – I know:

(1) My self-citations are primarily, like all members of the now defunct ICEPOP group, necessary ones to show the construction and development of the ICECAP outcome instruments. It is only now that there begins to be sufficient other groups using and developing the instruments for citations to be more spread.

(2) I don’t believe I’ve done so in my DCE papers, but again, for Best-Worst Scaling, as for ICECAP, since I was a primary developer, of course I had to do some self-citing!

(3) I don’t want, even as a former I4C member, even to be perceived to be engaging in this phenomenon unnecessarily – even when I know I am not. I have checked with the main journals’ preferred tools which can (unlike google scholar) strip out self-citations and my h-index only dropped by 2. I will have quite a wait before I get results from the similar online program constructed by Norman’s co-author to do so with google scholar.

So, I know I am not driving any unnecessary self-citing in Australasia, but my “5 papers” paper inevitably mentioned I4C member papers and might have contributed to bad perceptions of me/them – therefore I have decided not to take it further.

*EDIT I had a revise and resubmit decision with referee comments that I didn’t anticipate being difficult to address.


Citation of Flynn Articles

As mentioned already, some of my papers are incorrectly cited, so here is a (handy?) list of the principal discrete choice related ones and the reasons for citing each 🙂

Flynn principal papers with primary relevance to choice models and reasons for citing

Coast J, Flynn TN, Sutton E, Al-Janabi H, Vosper J, Lavender S, Louviere JJ, Peters TJ. Investigating Choice Experiments for Preferences of Older People (ICEPOP): evaluative spaces in health economics. Journal of Health Services Research and Policy 2008;13(suppl 3):31-37

  • Paper which introduces the wider context of the BWS work (quality of life and the ICEPOP programme)


Flynn TN, Louviere JJ, Peters TJ, Coast J. Best-Worst Scaling: What it can do for health care research and how to do it. Journal of Health Economics 2007;26(1):171-89.

  • Original Case 2 (Profile case) user guide
  • Uses old term – attribute case – since dropped
  • But NOT the first case 2 published study in health – see Szeinbach et al 1999: Using conjoint analysis to evaluate health state preferences
  • Contrasts maxdiff and marginal models (though unfortunately not called these in the paper)


Lancsar E, Louviere JJ, Flynn TN. Several methods to estimate relative attribute impact in stated preference experiments. Social Science and Medicine, 2007;64:1738-1753.

  • First paper to discuss attribute IMPACT versus IMPORTANCE and how to get the former from BWS (amongst other techniques applied to DCE data)


Flynn TN, Louviere JJ, Peters TJ, Coast J. Estimating preferences for a dermatology consultation using Best-Worst Scaling: Comparison of various methods of analysis. BMC Medical Research Methodology 2008; 8:76.

  • Corrected the coding for the marginal sequential model from JHE article in order that model summary statistics be correct
  • Showed no difference in results from different models
  • Illustration of heterogeneity using effects coding and thereby showing how splitting out attribute impact out can be useful


Marley AAJ, Flynn TN, Louviere JJ. Probabilistic Models of Set-Dependent and Attribute-Level Best-Worst Choice. Journal of Mathematical Psychology 2008; 52:281-296.

  • Paper with mathematical proof of Case 2 estimator properties
  • The proof that you can’t, in fact, get attribute importance from BWS (contrary to McIntosh & Louviere original claim) though you do go a long way toward it, getting attribute IMPACT, (discussed in more detail in JOCM 2013)
  • Review of the literature on discrete choice tasks showing that despite 40 years of research estimation of attribute IMPORTANCE (in a discrete choice tasks framework) remains elusive
  • Presented hypothetical example of how manipulating context across two DCEs would finally allow estimation of attribute importance – this principle is used by Flynn et al (JOCM 2013)


Flynn TN, Marley AAJ, Louviere JJ, Peters TJ, Coast J. Rescaling quality of life tariffs from discrete choice experiments for use as QALYs: a cautionary tale. Population Health Metrics 2008; 6:6

  • Key paper showing why putting the death state into any RUT-based model is wrong.
  • Also first paper to discuss why the choice of health states used in any RUT-based model must take into account how scale might vary as a result – e.g. two states far apart on the latent health scale will have a large variance scale factor, much larger than two states close together on the latent health scale


Louviere JJ, Flynn TN. Using Best-Worst Scaling Choice Experiments To Measure Public Perceptions and Preferences for Healthcare Reform in Australia. The Patient: Patient-Centered Outcomes Research 2010;3(4):275-283

  • First Case 1 paper in health


Flynn TN, Peters TJ, Coast J. Quantifying response shift or adaptation effects in quality of life by synthesising best-worst scaling and discrete choice data. Journal of Choice Modelling 2013;6:34-43

  • Hopefully seminal paper as the first empirical application of the proposed model put forward by Marley et al (2008) and discussed by Flynn (2010) in which context is varied in order to get attribute importance.
  • This is used in an attempt to quantify response shift (adaptation) in a quality of life context.


Flynn TN, Louviere JJ, Peters TJ, Coast J. Using discrete choice experiments to investigate heterogeneity in preferences for quality of life. Variance scale heterogeneity matters. Social Science and Medicine 2010; 70:1957-1965

  • First paper in health to demonstrate the scale adjusted latent class (SALC) model to attempt to properly adjust for scale (the G-MNL being the other model)
  • Hypothesises how individual level valuations of health/QoL states is now possible


Louviere JJ, Flynn TN, Carson R. Discrete choice experiments are not conjoint analysis. The Journal of Choice Modelling 2010;3(3):57-72

  • Paper discussed theoretical underpinnings of DCEs
  • Why these are not part of the conjoint measurement paradigm in academia,
  • Why we need to be aware that we are concentrating on a tiny part of the whole decision making process that humans use


Flynn TN. Valuing citizen and patient preferences in health: recent developments in three types of best-worst scaling. Expert Review of Pharmacoeconomics & Outcomes Research 2010; 10(3):259-267.

  • First paper to set out the definitive naming of all three types (“Cases”) of BWS
  • The object case – Case 1, where the choice options are non-attribute based simple options,
  • The profile case – Case 2, where the choice options are attribute levels within a profile-based framework
  • The multi-profile case – Case 3, where the choice options are multi-attribute profiles
  • These names were chosen by the inventor of BWS (Jordan Louviere) and agreed by the three authors writing the definitive textbook (CUP 2014 forthcoming).
  • First paper in health to use the best-minus-worst scores (the “scores”), from Marley & Louviere (2005) in summarising outcomes


Flynn TN. Using conjoint analysis and choice experiments to estimate quality adjusted life year values: issues to consider. Pharmacoeconomics 2010;28(9):711-722

  • Paper notes likely problems getting interactions from Case 2
  • First paper to propose various options in valuing health states in discrete choice framework, ranging from using TTO to rescale estimates from a DCE without a duration attribute (cheap but not ideal) to including duration as an attribute (expensive, but theoretically more appealing)
  • First paper to discuss how Case 2 might change the context of the problem under investigation from a traditional DCE – and how the researcher should consider this when tempted to make inferences about the validity of one or both types of task (within- versus between-profile tasks)
  • Further discussion of how variance scale is important to consider in discrete choice based valuation studies in health
  • Further discussion of why including the “dead state” is wrong
  • Warns health economists about the possibility that respondents might interact with the design (“demand artefacts”) and the need for common designs in order that there is no confounding of design with utilities


Hawkins GE, Marley AAJ, Heathcote A, Flynn TN, Louviere JJ, Brown SD. Integrating cognitive process and descriptive models of attitudes and preferences. Cognitive Science (in press, accepted 23 May 2013

  • Collection of response times alongside DCE data
  • How, for the first time, inferences from a stated preference RUT-based model have been proved from a proper process model (the LBA).
  • Uses BWS as a vehicle for collecting data and providing additional proof of process model


Louviere JJ, Lings I, Islam T, Gudergan S, Flynn TN. An Introduction to the Application of (case 1) Best-Worst Scaling in Marketing Research. International Journal of Research in Marketing 2013;30(3):292-303.

  • First user guide paper for Case 1 but in marketing.
  • Many of the techniques already in use in early health papers including a recent paper in SSM by Swiss researchers.


Potoglou D, Burge P, Flynn TN, Netten A, Malley J, Forder J, Wall B, Brazier J. Best-worst Scaling vs. Discrete Choice Experiments: An Empirical Comparison using Social Care Data. Social Science and Medicine 2011;72(10):1717-1727

  • First comparison of DCE and Case 2 data, though not in a fully scale-adjusted model.


Coast J, Al-Janabi H, Sutton E, Horrocks S, Vosper J, Swancutt D, Flynn TN. Using qualitative methods for attribute development for discrete choice experiments: issues and recommendations. Health Economics 2012;21(6):730-741

  • Paper dealing with issues raised by Louviere, Hensher & Swait (2001 book) in terms of providing guidance on development of attributes
  • Compares various qualitative methods in the light of experiences from the ICEPOP programme

last day plus icecap

Last day at UTS! Please note that from Monday I will be an employee of the University of South Australia (UNISA). My email is up and running (see posting elsewhere) and I have an office etc – working in North Sydney will be fantastic and I nosed around the office yesterday. It’s gorgeous – the operations team and some of the directors are already in place.

Also, if you use twitter you might like to follow the official ICECAP measure profile or @ICECAPm – it already has news of the next user group meeting at which I will be presenting.

For newbies, the ICECAP instruments use Sen’s Capabilities Approach as a framework for measuring and valuing well-being in a way that isn’t limited to “just” health. Thus, they are good alternatives to existing well-being instruments. The valuation is/has been done using the methods of another winner of the Nobel prize in Economics – Dan McFadden – discrete choice modelling. In particular the best-worst scaling variety in which I have led development in health and am a recognised global expert. BWS is now taking the world by storm and the book – to be published by CUP – will be finished in the next few weeks.

Best-Worst Scaling Book almost finished

Tony Marley will arrive in Sydney this weekend. He, jordan and I will finalise the BWS book for CUP during the two weeks Tony is here – sweet!

There will be empirical chapter(s) on using the best-minus-worst scores ( the ‘scores’) in analysis together with a chapter publishing the complete set of Australian socio-demographic related tariffs for ICECAP-O and (if collaborators permit it) the Canadian population tariff for the same instrument.

Happiness economics decimated

Great paper by one of the top Capabilities Approach researchers, Martha Nussbaum – it decimates the fad for happiness economics that infests public policy, particularly in the UK.

Of course, the ICECAP instruments are based on the Capabilities Approach and I have shown empirically (when I get round to publishing it) that ICECAP-O differs from a happiness/life satisfaction score measure in exactly the way Nussbaum predicts.

Back from QLD

Had a very enjoyable and restful 5 days in Queensland on the Gold Coast with friends. Started a big spending spree on clothes that continued into the Boxing Day sale period here in Sydney! I’ve probably spent the savings I’m going to make following decisions to (1) Cancel Foxtel, (2) Move from Telstra for ISP, (3) Move bank (or at least open a new savings account). Oh well.

Today I am tidying and doing spring cleaning, then work restarts for me (even if not intensely and at my desk at home)….I am going to finish the chapters I am writing for the book on best-worst scaling. We’re submitting the final draft to CUP in the middle of January.

I have a trip planned to UK in late February to attend the ICECAP user group meeting and the advisory group meeting for ICECAP-SCM. Also at some point early in 2014 we are all (as a centre Institute) going to Adelaide to meet our new university!