Category Archives: DCE support

semi-retiring from blogging

Unfortunately I shall be semi-retiring from blogging.

When I say “semi”, I mean that general discussions on my personal website and comment on my personal twitter account will become few and far between. I shall continue to make comments/blogs on my work account.

There are several, in some cases related, reasons:

(1) Standards of practice in DCEs are not improving in health. It’s profoundly depressing when you read a blog entry/article/op-ed that has you nodding fiercely – as just happened – and then you get to the central defence of the paper. And it involves a discrete choice experiment that has not followed proper practice and stands a non-trivial chance of being totally wrong.

(2) Standards of literature review are appalling and getting worse by the year. When I did my PhD you wouldn’t dream of submitting a paper that didn’t show awareness of the literature – particularly if key aspects of your design have been heavily criticised by others.

(3) I get the distinct impression “political arguments” are trumping “data”. This partly follows on from (2): it’s well-known and established why quota sampling is important in DCEs yet “population representative sampling” continues to be used as an “advantage” (ha!) of DCEs done in the field of QALY Decision-making.

If this makes no sense to you then can I respectfully suggest you need to go do some reading?

If you don’t know the finding (from the mid 1980s) that heteroscedasticity on the latent scale is a significant problem in terms of bias, and how it matters in QALY studies, then it makes me think you have a rather large hole in your statistical knowledge and worries me immensely.

I won’t name names, in the interests of discretion, but I’m tired of making this point year in year out, with no result (with the honorouble exception of the EuroQoL Foundation who funded a group I am part of to look at this)….and I showed it empirically in the BWS book. Please read the health chapters to understand this. I’m open to questions by email if you don’t understand the logic.

(4) I spent a lot of my own money showing how attitudes are related to preferences in terms of politics…..which got me zilch…..the media are lemmings….they’d rather all jump off the cliff together than report something different (and based on stronger assumptions) and risk being “the one who was wrong”. Again, lack of statistical training, noted already by people like Ben Goldacre.

So I’m afraid I’m a little tired of all this. I have a business to run. Parents to do a lot of stuff for.

I’m still here on email – ask me if you’re puzzled. I’m not trying to be obstructive here. But I need to concentrate on putting food on the table.

All the best,


BWS neither friend nor foe

This post replies to some requests I have had asking me to respond to a paper concluding that DCEs are better than BWS for health state valuation. To be honest I am loathe to respond, for reasons that will become apparent.

First of all, let me clarify one thing that people might not appreciate – I most definitely do not want to “evangelise” for BWS and it is not the solution in quite a few circumstances. (See the papers coming out from the CHU-9D child health valuation study I was involved with for starters – BWS was effectively a waste of resources in the end….”best” choices were all we could use for the tariff.)

I only really pushed BWS strongly in my early days as a postdoc when I wanted to make a name for myself. If you read my papers since 2007 (*all* of them) you’ll see the numerous caveats appear with increasing frequency. And that’s before we even get to the BWS book, where we devote an entire chapter discussing unresolved issues including the REAL weaknesses and research areas for BWS (as opposed to straw men I have been seeing in recent literature).

OK now that’s out of the way, I will lay some other cards on the table, many of which are well-known since I’ve not exactly been quiet about them. I had mental health issues associated with my exit from academia. I’m back on my feet now doing private sector work for very appreciative clients, but that doesn’t mean I want to go back and fight old battles….battles which I erroneously thought us three book authors had “won” by passing muster with the top mathematical psychologists, economists and others in the world during peer review. When you publish a paper in the Journal of Mathematical Psychology (the JHE of that field) illustrating a key feature/potential weakness of a DCE (or specifically Case 2 BWS) back in 2008 you tend to expect that papers published in 2016 would not ignore this and would not do research that showed zero awareness of this issue and as a result made fundamental errors – after all, whilst we know clinical trials take a while to go from proposal to main publication, preference studies do NOT take 8+ years to go through this process. I co-ran a BWS study from conceptualisation to results presentation in 6 days when in Sydney. Go figure.

So that’s an example of my biggest frustration – the standards of literature review have often been appalling. Two or three of my papers (ironically including the JHE one, which includes a whopping error which I myself have repeatedly flagged up and which I corrected in my 2008 BMC paper) seem to get inserted as “the obligatory BWS reference to satisfy referees/editors” and in many cases bear no relation to the point being made by authors. Alarm bells immediately flash when I read an abstract via a citation alert and see those were my references. But it keeps happening. Not good practice, folks.

In fact (and at a recent meeting someone with no connection to me said the same thing) in certain areas of patient outcomes research the industry reviews are considered far better than academic ones – they have to be or get laughed out of court.

Anyway, I have been told that good practice eventually drives out bad. Sorry, if that’s true, the timescale was simply too long for me, which didn’t help my career in academia and raised my blood pressure.

Returning to the issue at hand. I’m not going to go through the paper in question, nor the several others that have appeared in the last couple of years purporting to show limitations of BWS. I have a company to run, caring obligations and I’ve written more than enough for anyone to join the dots here if they do a proper literature review. My final attempt to help out was an SSRN paper. But that’s it – without some give and take from the wider community, my most imaginative BWS work will be for clients who put food on the table and who pay – sometimes quite handsomely – for a method that when properly applied shows amazing predictive ability together with insights into how humans make decisions.

Now, of course, health state valuation is another kettle of fish – no revealed preference data etc. However, Tony, Jordan and I discussed why “context” is key in 2008 (JMP); I expounded on this with reference to QALYs in my two 2010 single authored papers, and published a (underpowered) comparison in the 2013 JoCM paper (which I first presented at the 2011 ICMC conference in Leeds, getting constructive criticism from the top choice modellers on Earth). So this issue is not particularly new.

It’s rather poor that nobody has actually used the right design to compare Case 2 BWS with DCEs for health state valuation…I ended up deciding “if you want something done properly you have to do it yourself” and I am very grateful to the EuroQoL Foundation for funding such a study, which I am currently analysing with collaborators. I don’t really “have a dog in this fight” and if Case 2 proves useful then great, and if not then at least I will know exactly why not…and the reasons will have nothing to do with the “BWS is bad m’kayyyyy” papers published recently. (To be fair, I am sometimes limited in what I can access, with no longer having an academic affiliation so full texts are sometimes unavailable, but when there’s NO mention of attribute importance in the abstract, NOR why efficient designs for Case 2 are problematic my Bayesian estimate is 99.99% probability the paper is fundamentally flawed and couldn’t possibly rule BWS in or out as a viable competitor to a DCE.)

If you’d like to know more:

  • Read the book
  • Read all the articles – my google scholar profile is up to date
  • Get up to speed on the issues in discrete choice design theory – fast. Efficient designs are in many many instances extremely good (and I’ve used them) but you need to know exactly why in a Case 2 context they are inappropriate.

If you still don’t understand, get your institution to contract me to run an exec education course. When I’m not working, I’m not earning, full stop.

I’m now far more pragmatic about the pros and cons of academia and really didn’t want to be the archetypal “I’m leaving social media now” whinger. And I’m not leaving. But I am re-prioritising things. Sorry if this sounds harsh/unhelpful – I didn’t want to write this post and hoped to quietly slip beneath the radar, popping up when I thought something insightful based on one of BWS’s REAL disadvantages or Sen’s work etc was mentioned. But people I respect have asked for guidance. So I am giving what I can, given 10 minutes free time I have.

Just trying to end on a positive note – I gave a great exec education course recently. It was a pleasure to engage with people who asked questions that were pertinent to the limitations of BWS and who just wanted to use the right tool for the right job. That’s what I try to do and what we should all aim for. I take my hat off to them all.

BWS correct referencing redux

This is not exactly a moan (since in some cases I’m requesting fewer references to one or two of my own papers, which is all very nice!). It’s just a reminder that BWS has been an evolving technique over many years and I continue to note too many people just seem to add the JHE 2007 paper as “the BWS reference” when it really isn’t supporting what they are doing or saying.

I’m not been afraid to admit when I’ve done something incorrect/misleading, or when the field has moved on and an earlier paper is becoming outdated. (So when I call others on bad referencing, rest assured that I do the same for myself.)

Some points to note:

  • The JHE article was the first comprehensive explanatory Profile Case (Case 2) BWS paper. However, the “marginal models” there involved coding that although gives correct point estimates, give misleading summary statistics like log-likelioods, by not taking account of the sequential nature of the data. Thus, a choice from 5, means only 4 options are available for the second choice.
  • This was corrected ASAP – the 2008 BMC paper on dermatology study corrected this, so marginal sequential models should really reference this paper.
  • References to “dual/multi stage choice tasks” (primarily to get QALYs) should start with my 2010 Pharmacoeconomics paper, since that was the first to propose these (including the DCE+TTO rescaling) method. Too many researchers reference later papers.
  • I was also first in explaining why the “death state” can’t be valued in a DCE without duration and a higher resolution design – in 2008 I wrote about this in Pop Health Metrics, with the God of math psych, Tony Marley, amongst others. I also pointed out why variance scale factors can be highly problematic in DCEs/other choice models. I certainly wasn’t first on the latter point – you should be looking to papers in the 1990s by Swait & Louviere, and Hensher and Louviere for that.
  • First reference to a Case 1 BWS study is in The Patient: Patient-Centered Outcomes Research (2010) by Louviere and Flynn (to my knowledge – I am happy to be corrected if wrong).
  • If you’re comparing Case 2 BWS with DCEs you really should be understanding and discussing how they differ, which was introduced in detail in the 2013 JoCM paper by Flynn et al. Subsequent discussion in the book (2015). DO NOT conclude that either method is “wrong”/”right” purely on basis of comparison of results from each task. Our work explains why they might differ.
  • For Case 3 BWS I’m not the key person, Emily Lancsar was/is big in introducing and applying this in health. Please also note the correct name for this is the “multi-profile case” as agreed by Louviere, Marley and me in preparation for the book. Like the profile case, renaming was done so as to better describe what made Cases 2 and 3 distinct from other Cases.
  • First reference to a peer-reviewed published Case 2 study was from the 1990s by Szeinbach et al; first UK study was 2006 by our team in BJD.
  • Finally, the emerging problems with highly efficient designs: Rose and Bliemer hypothesised this back in 2009; I and team published the first within-subject confirmation in Pharmacoecon 2016.

Thus, it’s just a guide to help practitioners get the correct reference for BWS and associated conceptual issues. Hope it helps. I may add to this if I think of other issues that are incorrectly attributed.


efficient design problems redux

Two papers in the past week have given me need to remind people that efficient designs in DCEs may not be the bee’s knees. I posted a while back when my paper showing this was accepted and gave more detail, which ultimately became part of my SSRN paper, after publication.

I’ll quote from my latter post:

Street and Burgess had begun to provide CenSoC with designs whose efficiency was 100% (or close to it), rather than 30-70%. We loved them and used them practically all the time. In parallel with this, John Rose at the Institute of Transport and Logistics Studies at Sydney University had begun utilising highly efficient designs – though of a different sort. However, what efficient designs have in common – and really what contributes heavily to their efficiency – is a lack of level overlap. This means that if the respondent is presented with two pairs of options, each with five attributes, few, and in many cases none, of those attributes will have the same level in both options. Thus, the respondent has to keep in mind the differences in ALL FIVE ATTRIBUTES at once when making a choice. Now, this might be cognitively difficult. Indeed John Rose, to his immense credit, made abundantly clear in the early years in a published paper that his designs, although STATISTICALLY EFFICIENT, might not be “COGNITIVELY EFFICIENT”, in that people might find them difficult (pushing up their error variance) or, even worse, use a simplifying heuristic (such as “choose the cheapest option”) in order to get through the DCE. (Shame on us CenSoCers for not reading that paper more closely.) Clearly in the latter case you are getting biased estimates – not only are your parameter estimates biased (in an unknown direction) but the functional form of the utility function for such respondents is wrong. Now John merely hypothesised this problem – he had no empirical data to test his hypothesis, and recommended that people go collect data. For many years they didn’t.

My study was the first within-subject study to be published (though I know of at least one other within-subject study that was doing the conference rounds at about the same time and may well have been published since). It certainly has influenced my thinking and the paper in the current AHE Blog has found – although I believe using betwen-subject study only – that yes, efficient designs for “complete EQ-5D-5L described lives” seemed to cause problematic beta estimates. They advocate two-step designs – something a group I’m working with are already doing….hopefully we will have some interesting stuff to present next year at a EuroQoL Group meeting.

I shall simply end with a warning I put in the SSRN paper concerning efficient designs (which certainly have their place, don’t get me wrong, but you can’t use them unthinkingly):

Of course if it turns out that greater precision has been gained at the expense of bias, then efficient designs replace what is merely an annoyance with a crippling flaw.



effects or dummies redux

That old bugbear comes back….are effects codes really superior to dummy variables?


This note revisits the issue of the specification of categorical variables in choice models, in the context of ongoing discussions that one particular normalisation, namely effects coding, is superior to another, namely dummy coding. For an overview of the issue, the reader is referred to Hensher et al. (2015, see pp. 60–69) or Bech and Gyrd-Hansen (2005). We highlight the theoretical equivalence between the dummy and effects coding and show how parameter values from a model based on one normalisation can be transformed (after estimation) to those from a model with a different normalisation. We also highlight issues with the interpretation of effects coding, and put forward a more well-defined version of effects coding.

That’s one of the joys and frustrations of DCEs; why you can never rest on your laurels and should really be acknowledging that it is a field in its own right; why you should have a DCE expert on your team for all important projects. Just when you thought something was right, its merits are questioned. Fun fun fun.

first reference to discrete choice in health

Just a short update today.

Via Twitter I learned that Professor Philip Clarke (University of Melbourne) gave a great seminar at the Office of Health Economics. His topic was history of economic evaluation in health generally but there was a particular gem in there of interest to me.

It appears we are all wrong. The first time Thurstone’s method of paired comparisons was proposed as a possible way of valuing health states was in 1970! On page 1041 of A health-status index and its application to health-services outcomes. Fanshel S & Bush JW. Operations Research, 18(6): 1021-1066.

We stand corrected, thank you.

PS Thurstone did pairs only because the multinomial model wasn’t available then, only probit (normal based) distributions, which don’t have closed form for 3+ options. So if you want the general (non-health) first reference to the multinomial (conditional) logit, it’s McFadden’s article or, if you’d like the earlier non-economics one, go read and reference Luce and Marley’s books from the 1950s and 1960s. Plus if you want to reference DCEs and why they are better than looking at all pairs – i.e. the addition of experimental design to choice models – it’s Louviere and Hensher’s work in the early 1980s.

EDIT at 11:40 BST to correct OHE’s name.

euroqol group funding

Sander Arons, Karin Oudshoorn and I have a EuroQoL funded project: how a *correctly* designed Case 2 best-worst scaling (BWS) study + a DCE can give us a tariff for EQ-5D-5L.

The study will be interesting as we work from first principles: what is the tariff for an INDIVIDUAL person. Then do all the things I’ve been harking on about for 5+ years to get a proper population tariff.

I don’t claim this will turn the world upside down. But I do think it will give some food for thought for whether the Secretariat wants to consider DCEs for the 5L tariff. I fully recognise that the EuroQoL group has all sorts of constraints to work under (linking with the -3L version, consistency of methods etc) which I totally understand. I’m pragmatic these days. But I think, if our testing results hold true in the main study, that we will raise some eyebrows in the results and give the EuroQoL Group the chance to get back in front in terms of methodology.

Interesting times ahead!

research council funding

EDIT 27 July 2016: Just to clarify, this post is not due to any current issues/proposals/projects. It is a piece I have had in mind for a long while now but couldn’t realistically write whilst I was still under funding/contractual obligations. Recent encounters with other funders have (so far, fingers crossed, touch wood) been more positive!


Funding by UK health and EU funding councils has, in my experience, been less than satisfactory. Cuts that seem totally arbitrary have been made. In fact I cannot recall a single discrete choice experiment (BWS) study since the original ICECAP-O study that has been fully funded.

Universities I worked at pretty much absorbed the costs. Now, this was partly because all the valuation exercises I have been involved in have raised new issues. But more recent examples have been particularly annoying for my collaborators – they have ended up putting in a lot of extra time. It made me feel guilty, thinking “perhaps I should have costed more time”. But the truth is, whatever I asked for in the past, I got cut BIGTIME.

This comes back to issues that I have had close to my heart for a while now, the issues of (1) what exactly “discrete choice modelling” (DCM) is as a discipline and (2) where it sits within health services research (at least for its application in health). I personally consider choice modelling to be an entire discipline and when in health to be a subdiscpline of HSR akin to “qualitative research” or “biostatistics”. Now, funded trials would not dream of having a jobbing RA to do a fully-fledged qualitative project within a programme grant, nor have an RA do all the biostatistics. So why does DCM get treated as the poor relation? There should ALWAYS be an expert (with 15+ years of experience) in charge of the DCM, with a junior to learn. Because discrete choice experiments (DCEs) are NOT just a branch of health economics or biostatistics. Choice modelling is a discipline in its own right, with a totally different set of statistical skills, economic assumptions and especially psychological knowledge required to do it successfully. It most definitely IS NOT just another preference elicitation method.

Qualitative research quite rightly fought hard to gain acceptance as an equal partner in HSR studies. DCM should too. It’s simply not good enough to have juniors doing DCEs in major clinical trials. Again, would you get an RA without specific training to do all the biostats? No? Then why is it considered acceptable in DCM?

Those of us who spent over 15 years learning our trade find it, frankly, a little insulting, that someone from an MSc or PhD should be considered able to run a DCE in a trial. DCE analysis is not purely a science. It’s very difficult to teach. It’s part art. Do it for 10+ years – ACROSS MULTIPLE DISCIPLINES – then you are just about able to do it properly.

May we have some acknowledgement that what we do is important please?

This isn’t just for the benefit of us “insiders”: the industry loses since the standards of reporting and conduct have not, in my opinion, improved much at all in health. Sooner or later (and sooner if one study I have in mind in a major journal is compared with real preference data) the results of a poor DCE will be comprehensively discredited. Then we all lose. Actually I don’t. Because I do studies predominantly for the private sector who are very sensitive to incorrect results. They commission me because I get them good results. But it’d be a shame if DCM is lost to the public sector. All because nobody wanted to pay for senior people to do things correctly.

I’m told I can be too negative. Fair enough. Yes – standards are a LOT better than 10 years ago. But please remember, whilst the health field has moved on, those fields way ahead of you (marketing/environmental econ/transport econ etc) have also moved on. A lot. You’ve caught up a little. But not by enough. Experience has become doubly important – reading the literature won’t tell you how to do a DCE perfectly. Because of that dirty little secret…..there are an infinite number of solutions to a DCE. If you’ve not analysed 20+ DCEs are you really confident you know what solution to quote to policymakers? Particularly in health where the “tricks” available to solve the problem in other disciplines are not available? I’ll end with a quote:

Harry Callahan: “Uh uh. I know what you’re thinking. “Did he fire six shots or only five?” Well to tell you the truth in all this excitement I kinda lost track myself. But being this is a .44 Magnum, the most powerful handgun in the world and would blow your head clean off, you’ve gotta ask yourself one question: “Do I feel lucky?” Well, do ya, punk?

You are playing with a loaded gun if you don’t know enough about what solutions make sense and what will blow your reputation up. Do you feel lucky?

design: a bit behind the ball

I just had a citation alert to this article on design efficiency in DCEs in health. Nowadays I skim citation alerts (at best) as they come so thick and fast (*polishes halo*). However, this one caught my eye being in a BMJ Open Access journal and being on design, a subject currently close to my heart.

The article wasn’t bad. I just wouldn’t say it was good either. Whilst the quantity of references was sufficient, there were a number of them that frankly were irrelevant and should have been replaced by (much) more important ones. When none of the major textbooks that include chapters on design guidance (ones by Rose/Hensher/Louviere etc) are mentioned, nor a key paper by Rose and Bliemer, you sigh.

Plus, I know I might be mis-remembering this (no longer having institutional access to check, and with the paper copies of key references being packed away unaccessible at the moment), but investigations of factors affecting design efficiency have been done already, surely?

But it was the cognitive vs statistical efficiency issue that really got me to sign up in order to make the following comment (which seems, at present, to be in moderation purgatory, though Monday may change things).

Nice investigation but I’m afraid some key non-health references are missing which would have addressed/begun to address some issues you raised. Regarding design guidance, the two seminal textbooks are not referenced, together with Rose and Bliemer’s 2009 paper.

You also appear to have understated the seriousness of the problem if the quest for efficiency leads respondents to use heuristics: your results become BIASED (useless). You say “Using a statistically efficient design may result in a complex DCE, increasing the cognitive burden for respondents and reducing the validity of results. Simplifying designs can improve the consistency of participants’ choices which will help yield lower error variance, lower choice variability, lower choice uncertainty and lower variance heterogeneity” but these are the least of your worries if the functional form of the utility function depends on the design. To their credit, Rose and Bliemer pointed out this possibility back in 2009; it’s already been observed in between-subject comparisons and I and co-authors published the first within-subject study in health and found the problem was extremely severe:

Flynn TN, Bilger M, Malhotra C, Finkelstein EA. Are Efficient Designs Used In Discrete Choice Experiments Too Difficult For Some Respondents? A Case Study Eliciting Preferences for End-Of-Life care. Pharmacoeconomics 2016:34(3);273-284

The paper was submitted right around the time ours came out in the print version, but I know our e-version was around before then, not to mention the possibility of adding it at the review stage. Which actually leads me to worry about the refereeing process just as much as aspects of the original paper.

Best-Worst Scaling in Voting

My comment to one of the links posted to today’s “Water Cooler” posting at Naked Capitalism. (Cross posted to my company blog too). The original link concerned a proposal in the US state of Maine to introduce ranked voting rather than the first-past-the-post (FPTP) that is ubiquitous in the US and UK…..the proposal sounds attractive to people, but…..

“Ranking is a double edged sword – not that I condone the current first past the post (FPTP) system endemic in the US and UK (it’s the worst of all worlds) – but people should first look at what oddballs have ended up in the Federal Senate in Australia. Plus that awful Pauline Hanson may be about to make a comeback there.

Ranking has proven very very difficult to properly axiomatize – i.e. in practice, there are a whole load of assumptions that must hold for the typical “elimination from the bottom” (or any other vote aggregation method) to properly reflect the strength of preference in the population. For instance:
(1) Not everybody ranks in the same way (top-bottom / bottom-top / top, bottom, then middle, or any other of a huge number of methods);
(2) An individual can give you different rankings depending on how you ask him/her to provide you with answers (again, ask ranks 1, 2, 3, etc,…. 9, 8, 7, etc, 1, 9, 2, 8 etc ….)
(3) People have different degrees of certainty at different ranking depths – they are typically far less sure about their middle rankings than their top and bottom choices.

Unfortunately, where academic marketing, psychology and economics studies have been done properly, these kind of problems have proven to be endemic….furthermore they often matter to the final outcome, which is worrying. It’s why gods of the field of math psych (from Luce and Marley in the 1960s onwards) were very very cautious in condoning ranking as a method.

Statement of conflict of interest: Marley and I are co-authors on the definitive textbook on an alternative method called best-worst scaling….it asks people for their most and least preferred options only. The math is much easier and I’d be very very interested to see what would have happened in both the Rep/Dem primaries if it had been used – generally you subtract the number of “least preferred” votes from the number of “most preferred” – so people like Clinton and Trump with high negatives get into trouble….”

What I didn’t say (since the work is technical) is that Tony Marley has done a lot of work in voting and has published at least one paper extolling BWS as a method of voting.