Tag Archives: design

design: a bit behind the ball

I just had a citation alert to this article on design efficiency in DCEs in health. Nowadays I skim citation alerts (at best) as they come so thick and fast (*polishes halo*). However, this one caught my eye being in a BMJ Open Access journal and being on design, a subject currently close to my heart.

The article wasn’t bad. I just wouldn’t say it was good either. Whilst the quantity of references was sufficient, there were a number of them that frankly were irrelevant and should have been replaced by (much) more important ones. When none of the major textbooks that include chapters on design guidance (ones by Rose/Hensher/Louviere etc) are mentioned, nor a key paper by Rose and Bliemer, you sigh.

Plus, I know I might be mis-remembering this (no longer having institutional access to check, and with the paper copies of key references being packed away unaccessible at the moment), but investigations of factors affecting design efficiency have been done already, surely?

But it was the cognitive vs statistical efficiency issue that really got me to sign up in order to make the following comment (which seems, at present, to be in moderation purgatory, though Monday may change things).

Nice investigation but I’m afraid some key non-health references are missing which would have addressed/begun to address some issues you raised. Regarding design guidance, the two seminal textbooks are not referenced, together with Rose and Bliemer’s 2009 paper.

You also appear to have understated the seriousness of the problem if the quest for efficiency leads respondents to use heuristics: your results become BIASED (useless). You say “Using a statistically efficient design may result in a complex DCE, increasing the cognitive burden for respondents and reducing the validity of results. Simplifying designs can improve the consistency of participants’ choices which will help yield lower error variance, lower choice variability, lower choice uncertainty and lower variance heterogeneity” but these are the least of your worries if the functional form of the utility function depends on the design. To their credit, Rose and Bliemer pointed out this possibility back in 2009; it’s already been observed in between-subject comparisons and I and co-authors published the first within-subject study in health and found the problem was extremely severe:

Flynn TN, Bilger M, Malhotra C, Finkelstein EA. Are Efficient Designs Used In Discrete Choice Experiments Too Difficult For Some Respondents? A Case Study Eliciting Preferences for End-Of-Life care. Pharmacoeconomics 2016:34(3);273-284

The paper was submitted right around the time ours came out in the print version, but I know our e-version was around before then, not to mention the possibility of adding it at the review stage. Which actually leads me to worry about the refereeing process just as much as aspects of the original paper.

highly efficient DCEs can be bad paper published

Sorry, I should have posted this over a week ago – have had a cold and been working on a project.

But the paper on the perils of highly efficient designs in DCEs has been published! Wahay.

Personally I consider this paper to be the second best of my career (after the first BWS one in JHE)…ironic that I have left academia!

In terms of the content, some economists still don’t “get it” but that’s their problem really 😉

wiki and stuff

I have spent almost a day doing work on wikipedia articles.

I did some tidying up and editing of the BWS article, and I made substantive edits to the choice modelling one. In terms of the latter, I have tried not to fall foul of the NPOV crime – “neutral point of view”. I know that there are bunch of diehards out there in favour of the term “conjoint analysis”. The guy who is perhaps the top environmental economist, one of the top three choice modellers/marketers and I wrote an article explaining why it really is wrong to call what we do “conjoint analysis” – that is a particular academic technique/model, much as the maxdiff one is.

However, I do recognise that this is a battle we won’t win: too much industry is using those terms. Thus I acknowledged why “choice-based conjoint analysis” is used and attempted to give a full and frank justification for this. Of course I also gave our counter-argument, which relies on the academic case for DCEs and BWS!

Anyway I hope the two articles help newbies. I might edit the conjoint analysis one – I wouldn’t attempt to down play the enormous contributions by people like Green etc but I would make clear that the move towards choice-based techniques should lead the reader to another page. I would hope that that would not be poking a wasps’ nest!

In other news ,the efficient design paper is close to online publication – have amended the proofs – yay! We have also submitted the childrens’ health paper to a journal – will see how that goes…..it will be one of my last academic contributions to the field.


Finally I saw a piece by a former colleague in Australia which summarised a study to elicit Australians’ preferences for spending a given federal budget. It’s a shame this study was done – I fought tooth and nail with a former Director not to do it as it would embarrass the centre. Asking people what policy goal they would (1) most and (2) least like to receive spending (a Case 1 BWS study) was flawed for the following reasons:

(1) The Federal budget surplus/deficit is an ENDOGENOUS variable, NOT an exogenous one. The automatic stabilisers (unemployment and other safety net benefits) kick in when the economy goes down and the deficit increases naturally…or maybe the extra demand by people who would otherwise starve brings the economy UP. That’s the point – there is no “amount” the federal govt has to spend.

(2) The whole exercise is framed incorrectly. There is not a “pot of money” collected in taxes that the government has to spend. This confuses the household with the sovereign government. It is a nonsense to think that a sovereign government can “save” in a currency it creates with the press of a button (F5 F5 F5 F5). Plus think back to the beginning in a world without currency. Did the government tax to spend? No of course it didn’t. There was no money in circulation. It SPENT so money could enable trade and then the government could TAX in order to achieve its aims, ensure a demand for the currency etc.


A government spends by crediting accounts of the relevant beneficiaries – there is no money backing this. If the govt wants to issue bonds to “cover” the deficit it can do, but there is NO LINK with the deficit – indeed Australia provided the most recent example of this. Under the Howard government the government made a surplus, ergo it should have stopped issuing bonds (IOUs for overspending). It did this. What happened? The financial sector went apeshit and demanded it keep issuing them since they were running out of risk-free assets and assets upon which to price risk on other assets. It really was a wizard of Oz moment where the man behind the curtain was revealed.

So telling people “there is a budget of X million/billion dollars, what are your priorities?” is a misguided question. People will automatically get their own ideas about what is affordable with that budget of x, and indeed think in terms that there IS a finite budget. There is not. Now OF COURSE the govt can’t spend indefinitely, it’ll cause inflation when all unused factors of production become used. But we are nowhere near that point, as OECD/IMF figures show.

You should tell people “imagine there is no limit to spending; tell me your priorities” with a list of probabe spending for each. You may find that suddenly people choose things they assumed were unaffordable before.

So sorry I4C, you did a dud study. I did try to head this off before, but you have fallen for the fallacy of composition – this goes wayyyyy back oto Keynes. You can’t use microeconomics to solve macroeconomic problems. It’s a different discipline and NOT one us micro people should dabble in.


EDIT I have been informed that the budget was not divvied up in the DCE, which is good, and potentially makes the study correct. I just hope it, in the pre-amble, told respondents that the government budget is not constrained in any way, except when the entire economy is ful employed and there is no “slack in the system” – a situation we haven’t been in since the 1970s!

Where next for discrete choice health valuation – part two

Where next for discrete choice health valuation – part two

Part one of this series on the valuation of health (or quality of life) using discrete choice experiments (DCEs) and their variants concentrated on the tension between the size of the descriptive system and the needs of valuation. In particular, it summarised some disappointing findings in a study using best-worst scaling (BWS) to value the CHU-9D instrument (although I hasten to add we did successfully elicit a child population tariff!) Now I wish to re-emphasise that no definitive conclusion can be drawn from this, specifically whether the method, the instrument, or the kids themselves caused the problems. But it does raise issues that should be borne in mind if any ICECAP instrument for child quality of life is produced.

A very timely event this week has allowed me to discuss (in more detail than I would have otherwise done) a second issue that future valuation exercises using discrete choices (DCEs/BWS/ranking) should consider. The issue is the design of the valuation exercise.

The timely event was the acceptance (by Pharmacoeconomics) of a paper I and colleagues wrote on how varying levels of efficiency in a DCE might cause respondents to act differently. The paper is called “Are Efficient Designs Used In Discrete Choice Experiments Too Difficult For Some Respondents? A Case Study Eliciting Preferences for End-Of-Life Care” by T.N. Flynn, Marcel Bilger, Chetna Malhotra and Eric Finkelstein. The paper (two DCE experts thought) is revolutionary because it was a within-subject, not between-subject survey: all respondents answered TWO DCEs, differing in their level of statistical efficiency. Now, why did we do this, and what was the issue in the first place?

The background to this study is as follows. Several years ago, when I worked at CenSoC, UTS, Jordan Louviere called a bunch of us into his office. He had been looking at results from a variety of DCEs for some reason and was puzzled by some marked differences in the types of decision rule (utility function) elicited, depending on the study. Traditionally he was used to up to (approximately) 10% of respondents answering on the basis of a single attribute (lexicographically) – most typically “choose the alternative (profile) with the lowest cost”. Suddenly we were seeing rates of 30% of more. Why were such a substantial minority of respondents suddenly deciding they didn’t want to trade across attributes at all, but wanted to use only one? He realised that this increase in rates began around the time CenSoC had begun to use Street & Burgess designs. For those who don’t know, Street and Burgess were two highly respected statisticians/mathematicians working at UTS who had collaborated with Louviere from around the turn of the millennium in order to increase the efficiency of DCE designs. Higher efficiency means a lower required sample size – precision around utility parameter estimates is improved. It also offered Louviere the tantalising possibility of estimating individual-level utility functions, rather than the sample- or subgroup-level ones that DCEs could only manage previously. (Individual level “utility” functions had been around in the “conjoint” literature for a while but these relied on atheoretical methods like rating scales.)

Street and Burgess had begun to provide CenSoC with designs whose efficiency was 100% (or close to it), rather than 30-70%. We loved them and used them practically all the time. In parallel with this, John Rose at the Institute of Transport and Logistics Studies at Sydney University had begun utilising highly efficient designs – though of a different sort. However, what efficient designs have in common – and really what contributes heavily to their efficiency – is a lack of level overlap. This means that if the respondent is presented with two pairs of options, each with five attributes, few, and in many cases none, of those attributes will have the same level in both options. Thus, the respondent has to keep in mind the differences in ALL FIVE ATTRIBUTES at once when making a choice. Now, this might be cognitively difficult. Indeed John Rose, to his immense credit, made abundantly clear in the early years in a published paper that his designs, although STATISTICALLY EFFICIENT, might not be “COGNITIVELY EFFICIENT”, in that people might find them difficult (pushing up their error variance) or, even worse, use a simplifying heuristic (such as “choose the cheapest option”) in order to get through the DCE. (Shame on us CenSoCers for not reading that paper more closely.) Clearly in the latter case you are getting biased estimates – not only are your parameter estimates biased (in an unknown direction) but the functional form of the utility function for such respondents is wrong. Now John merely hypothesised this problem – he had no empirical data to test his hypothesis, and recommended that people go collect data. For many years they didn’t.

Hence we went on our merry way, utilising S&B designs, until Louviere spotted the problem and the potential reason for it. The problem was all the surveys he looked at utilised ONE DCE – so there is ONE level of efficiency – so he had only between-survey data and couldn’t be certain it was the efficiency that was driving the changes in respondent decision rule: perhaps the surveys with these high rates of uni-attribute decision-making were done in areas where people GENUINELY chose on the basis of a single attribute?

I chatted to him and he realised I was designing a survey in which I had an opportunity to do a within-subject choice experiment. Specifically, if my Singapore-based collaborators agreed, I could administer TWO DCEs to all respondents. Now I am not going to tell all about how we did this exactly but cutting to the chase, 60% (!) of respondents answered on the basis of a single attribute in a S&B design but one third of these (20% overall) then traded across attributes in a much less efficient design that exhibited some level overlap (making it – arguably – cognitively simpler). Finally, we had within subject evidence that PEOPLE INTERACT WITH THE DESIGN. Which, of course, has serious implications for generalisability, if found to be a common problem.

Why is this an issue for future valuation exercises? Well, I have seen presentations from researchers who used highly efficient designs in DCEs to get a tariff for health instruments. Essentially the choices on offer are time trade-off types where both quality of life and length of life differ. Now although the TTO is (probably) easier than the Standard Gamble, it is still a hard thing to get your head around if you have any cognitive impairment or are in a vulnerable group. So we probably don’t want to make things even more difficult than they already are.

This, of course, creates headaches for researchers: if we reduce efficiency to make the task easier then the required sample sizes will go up. We may have more limited ability to identify heterogeneity or estimate individual level models. But, as usual, I believe we are in a world of second best, so compromises may have to be made. One would be to cut out the length of life attribute from the DCE altogether and use a TTO to rescale the health values estimated from an easier task like BWS Case 2 – as I advocated a number of years ago as a “second best” valuation option. Not ideal, but would do the job. Other ideas include helping respondents get familiar with the options on offer through the use of other choices tasks (again, Case 2), which we have done in a study recently. In any case, if the design issue proves common – and my gut feeling, given the complexity of decision-making in health, is that it will – we will need to be imaginative with our designs.

A final issue that broadly comes under the design “umbrella” concerns sampling. One of the chapters in the BWS book utilised sampling criteria that deliberately avoided making the sample representative of the wider population. Why would we do that? Well, when you have a limited dependent variable model with discrete outcomes (rather than continuous outcomes in TTO/SG/VAS), characterising the heterogeneity correctly becomes absolutely crucial: if there is heteroscedasticity, the estimates won’t simply be inefficient, but BIASED. BIG problem. If, say, depressed people have different health preferences and choice consistency to non-depressed people but you don’t have enough depressed people in your sample to spot this, and mix them in with the others in estimation, you have the WRONG POPULATION TARIFF. So (and I have said this before in published papers), EVEN if you want a population tariff, to work within the traditional extra-welfarist paradigm, you still have to get the heterogeneity right – you must probably OVERSAMPLE those groups you suspect of having different preferences. Then, when you have estimated the tariffs for the different groups you must RE-WEIGHT to reflect the population distribution. THAT is the way to get the true population tariff. Of course if people in various different health states do not differ from the “average member of the population” in their health preferences the problem goes away. The problem, as the chapter in the book shows, is that (at least for well-being) people with impairments DO have different preferences: those impaired on attribute x desire to improve on attribute x, whilst those impaired on attribute y switch to wanting attribute z to compensate. The picture is extremely complex. So you should be using what we call “quota sampling” – making sure you have enough people in various key impaired states to estimate models for those subgroups. So survey design is a lot more complicated when you ditch TTO/SG/VAS.

I don’t mean to sound glass half empty regarding design. Leonie Burgess, when presented with the implications of her designs, was fascinated and saw it as opportunity (to change and improve the models) rather than a problem. I see it this way too. Things will get interesting (again) in DCEs in the coming years as we find out what we need to do in the design field to ensure we get unbiased tariffs for use in decision-making.

Although the third and final blog in this series (to appear next week) may seem superficially similar to the first one – I will discuss the size of the descriptive system – I will write in more detail about the process of constructing the instrument, both qualitative and quantitative*, and offer recommendations that may help alleviate the tension mentioned in the first blog.

*Discussing in more detail some constructive comments I had from Professor David Parkin.

Copyright Terry N Flynn 2015.

This, together with the accompanying blogs, will form a working paper to be submitted to SSRN. Please cite appropriately if referring to these issues in academic papers.


perils of efficient design paper accepted

Well, this it turning out to be a good week. First I hear the BWS book was published on time.

Now a paper that two of the global experts on discrete choice experiments found ground-breaking has been accepted for publication! The paper had previously been rejected by a another good journal – a decision I (and, crucially, others) found unfair since the two referee reports did not appear to be recommending rejection (though one was fairly critical and the other didn’t really “get” a lot of what we did, it being work that challenged some tenets of the current economic orthodoxy).

Anyway the paper is called “Are Efficient Designs Used In Discrete Choice Experiments Too Difficult For Some Respondents? A Case Study Eliciting Preferences for End-Of-Life Care” by T.N. Flynn, Marcel Bilger, Chetna Malhotra and Eric Finkelstein and has been accepted by Pharmacoeconomics.

The background to the paper and its implications will form the second blog in the series I am writing about the future of health state valuation using DCEs. Suffice to say, it is revolutionary because we got respondents to answer TWO DCEs, which differed radically in their statistical efficiency – one was 100% efficient, the other 40-50% efficient. The theoretical advantages of the former were, however, heavily attenuated by the fact many respondents resorted to heuristics that didn’t represent their true decision rule – causing biased estimates. Twas a cool paper, even if I say so myself. Anyway I shall follow up on this later in the week, that’s enough for today.

design human interactions

There’s increase in interest in how humans might interact with statistical designs.

Now, we typically construct designs with particular statistical properties (e.g. orthogonal, efficient, etc). However, this ignores the fact humans are sly. They spot patterns. They adopt heuristics. Their estimated preferences might not then represent their real utility function. I’m soon to submit a paper that gets people to answer two completely different designs: one is maximally efficient (and therefore difficult) whilst the other is relatively inefficient (and therefore easy). It’s amazing how many people use a simplistic heuristic to get through the efficient design (e.g. “choose profile with lowest pain level”) when in a more inefficient, but easier set of trade-offs they are willing to trade pain against other attributes.

This is deeply worrying. It means efficient designs may be crap. To be fair, John Rose (best efficient design guru in the world) noted that possibility *years* ago so it it’s not as if I’m finding a new weakness of them. I’m merely doing the properly within-person tests that that are required to test the hypothesis – CenSoC/I4C cross-sectional studies have seen the same phenomenon but couldn’t be sure it was an intra-subject problem. Hensher’s groups have also recently spotted it.

So all those of you thinking D-efficient designs are the dogs bo””ocks, you might want to think again. We need a trade-off between statistical and human efficiency.

DCE design relating to question order

A point that came to my attention last week concerned question order in the context of Case 1 (Object Case) Best-Worst Scaling studies. Now,

  • Youden designs ensure every choice item (in this case object) appears in every position (first, second,…,last) in every choice set the same number of times.
  • It is good practice to randomise the choice set order – indeed to do several randomisations and choose the ‘least bad’ in terms of ensuring no item always appears towards the start (or towards the end) of the survey.

However, the importance of the second point was brought home when a colleague was analysing response time data for an end-of-life care study. People tend to respond more quickly at the start of the experiment so teasing apart whether a slower response rate was due to an item appearing a lot towards the start, or whether there is something fundamentally different to explain its slower response time becomes difficult (a confounding issue). Thankfully we’d made no such mistake.

There are also interesting issues he found regarding separation of preferences from attitudes that are amazing and which we’ll write up in due course. Interesting times!