Tag Archives: bws

Best-Worst voting the answer?

With the truly appalling outcomes for Labour and Lib Dems – compared to where they need to be to be competitive in the General Election in a few weeks – maybe it is time to start thinking about electoral reform again.

Let’s start with that old trope from the LibDems – “fair votes”. Kenneth Arrow got a Nobel prize for proving there’s no such thing. Stop using the term. You decide what are the key welfare criteria you want from your system, then you can choose a voting system that delivers those (and probably not the “unimportant criteria”).

Now, we know there is a strong desire in the UK to preserve the link between “an MP” and “a constituency”. Fair enough. But the Alternative Vote – defeated in the referendum a few years ago – is not the only, or indeed perhaps even best, replacement for first-past-the-post (FPTP)

Tony Marley – co-author on the BWS book with me – has written a lot about the maths behind voting systems. People don’t realise Best-Worst Scaling works as a voting system. Plus I reckon it’d be attractive in the UK.

Here’s an example of how it might work, and deliver a different outcome to that observed in the results just published in the Local Election for the TEES VALLEY.

FIRST ROUND RESULTS:

  • CON – 40,278
  • LAB – 39,797
  • LD – 12,550
  • UKIP – 9,475

SECOND ROUND RESULTS (TOP TWO GET 2nd PREFS):

  • CON – 48,578
  • LAB – 46,400

So what happened? It’s pretty obvious most UKIP 2nd prefs went Conservative – their boost is suspiciously close to the UKIP vote. Of course we know UKIP has also poached from Labour in LEAVE-dominated northern seats, but I doubt many “kippers” put LAB as 2nd pref.

Where are the rest of the 2nd prefs?

About 7,000 are missing in action. Maybe people just refused to put a 2nd preference or gave them to fringe parties.

But I bet they knew what party they hated most.

Here’s how it might have played out under BWS:

  • LAB and LD voters encouraged to put Conservatives as “least”
  • UKIP put Labour (primarily) as “least” – some will put LD
  • CON put LAB as “least”

Result:

  • CON “lose” around 52,000 (LAB/LD) votes
  • LAB “lose” around 50,000 (CON/UKIP) votes

LIBDEM gain – or, if UKIP and some CON voters hate the LDs sufficiently (for their pro-Europe stance) even more than they hate Labour, then the “least” Labour vote leaves their net total beating the LIBDEMs. Either way the Conservatives don’t win – the UKIP/Conservative vote simply isn’t enough to offset both Labour and the LDs.

Of course with turnout around 21% a LOT more potential votes are up for grabs if people are energised to believe their vote(s) matter.

Worth thinking about.

 

 

BREXIT survey stuff on work account

Just a reminder that the results of my Best-Worst Scaling survey which showed what would happen if we could know the (LEAVE/REMAIN) view of every eligible voter in the UK is on my work account.

Most follow-up – regional variation, recommendations as to which type of BREXIT are preferred by whom, how 8% of that 28% who never turned out to vote could have held the key to everything – will be on that account too.

Some interesting observations from the raw data – and remember we can look at an individual’s responses here, because BWS gave us 10 data points to estimate 5 parameters:

  • The East Midlands, although heavily LEAVE, skews quite heavily toward a different type of BREXIT to other LEAVE regions.
  • The strong preference for free trade is simply not there….it has shifted – VERY heavily – toward the free movement of people throughout Europe. This “strong positive liking of immigration” is visible nowhere else. The non-English countries/principalities (Wales, Northern Ireland and Scotland) have a broadly neutral view on immigration. The non East-Midlands part of England strongly dislikes it
  • East Midlanders also have a strong antipathy toward several key aspects of the EU – in fact the pattern of their dislikes looks remarkably consistent with a “Swiss form of BREXIT” – one of the so-called “soft” BREXIT options.
  • They also are the region which loathes the EU budget contribution the most.
  • Their results form a remarkably realistic view, compared to some other segments of British society: they (we – am a Nottinghamian) seem quite happy to sacrifice elements of the single market and the customs union, plus we’ll adopt a constructive view on immigration with our European neighbours if it means we “get some money back”. We’ll also compromise on free trade quite happily.

So what gives? Has everyone round here had some secret training in Ricardo’s work, thus recognising when free trade is not welfare-enhancing?

BREXIT-REMAIN redux

eu_support_graph copy

 

Well, I’ve finally got round to programming a model that:

  • Asks you just five best-worst scaling questions – you choose your “most agreed with principle” and “least agreed with principle” – people take 2-3 mins to answer this tops.
  • Runs a best-worst scaling (BWS) exercise on just YOUR five answers.
  • Spits out three things:
    • A pie chart showing how likely each of the six main options (continued EU membership/Norway option/ Switzerland option/ Canadian option/ Turkish option/ World Trade Organisation option) would best satisfy YOUR principles
    • A pie chart showing the predicted chances of you personally supporting each of the five principles
    • A pie chart showing the predicted chances of you personally rejecting each of the five principles

 

 

 

 

Thus, the first chart tells you, based on which of these five principles we could “get” under each of the six models, what are the chances of getting “as much as we want” from each model of a new British-European relationship – the six models (one REMAIN, five BREXIT) .

This, like all CORRECT best-worst scaling, is an individual model, giving you PERSONALISED results, not “you averaged with others”.

We can, of course, average across people, slice and dice the results across sex/gender/political affiliation etc, to find out what model is most popular in certain groups. But the point is, my model doesn’t NEED to do that. All because just five BWS questions tell me everything I need to know about what you value.

Gold dust for all the campaigns – and the government, as it struggles to negotiate what type of new relationship would command majority support in the country.

I have deliberately answered the survey as a “hypothetical REMAINer” to show what they should have done – namely made the single European market something people understood and fought for, above other factors.

There are lots of scenarios – including what probably actually happened in that people were in reality “sure” they disliked free movement of people and/or EU budget contributions but unsure about their SEM/FTA/CU support – which lead to a BREXIT outcome as the most likely to achieve their preferences….your relative preferences for these determines which BREXIT model (hard/soft) is most likely to suit you.

Campaign managers/constituency parties/national party executives as well as Jo(e) Public would be very interested in this.

 

Best-worst capabilities endorsed

Wow. In this article Will Hutton interviews Amartya Sen. A crucial quote:

“…you have to take in, somehow, the unattractiveness of the last as well as the attractiveness of the first candidate.”

 

Wow, quantifying the worst as well as the best?

Which group has been at the forefront world-wide of doing this?

Yep, we’ve been way ahead of our time.

spring cleaning

New year = digital spring cleaning time! Ugh. No matter how future-proof you aim to be with how you structure files, aim to work seamlessly across PCs etc it never takes long for reality to change and you realise you need to do the rigmarole again.

When admin is done I’m back to the project looking at comparing Case 2 BWS estimates with DCE ones. I shall look with “fresh eyes” since I haven’t worked on it since before xmas. (Plus we need to get this rounded off so we can submit and get paid, hehe.)

Then it’s the (long-delayed) big marketing push for TF Choices LTD. I’ve had a good number of proposals and funded projects come my way so far but can’t rest on my laurels…time to make sure a load of marketers and others know what I can do for them, in addition to the academic community I was part of!

I can’t think of anything methodological I want to shout about today (phew, they think)….I’ll continue to post anything big or of key relevance but as there are only so many hours in the day and company stuff must come front and centre in 2017 it’s likely that my comments and posts will be related to things I’m doing at the time (like Case 2 vs DCEs) rather than detailed posts triggered by twitter or citation alerts I get.

 

BWS neither friend nor foe

This post replies to some requests I have had asking me to respond to a paper concluding that DCEs are better than BWS for health state valuation. To be honest I am loathe to respond, for reasons that will become apparent.

First of all, let me clarify one thing that people might not appreciate – I most definitely do not want to “evangelise” for BWS and it is not the solution in quite a few circumstances. (See the papers coming out from the CHU-9D child health valuation study I was involved with for starters – BWS was effectively a waste of resources in the end….”best” choices were all we could use for the tariff.)

I only really pushed BWS strongly in my early days as a postdoc when I wanted to make a name for myself. If you read my papers since 2007 (*all* of them) you’ll see the numerous caveats appear with increasing frequency. And that’s before we even get to the BWS book, where we devote an entire chapter discussing unresolved issues including the REAL weaknesses and research areas for BWS (as opposed to straw men I have been seeing in recent literature).

OK now that’s out of the way, I will lay some other cards on the table, many of which are well-known since I’ve not exactly been quiet about them. I had mental health issues associated with my exit from academia. I’m back on my feet now doing private sector work for very appreciative clients, but that doesn’t mean I want to go back and fight old battles….battles which I erroneously thought us three book authors had “won” by passing muster with the top mathematical psychologists, economists and others in the world during peer review. When you publish a paper in the Journal of Mathematical Psychology (the JHE of that field) illustrating a key feature/potential weakness of a DCE (or specifically Case 2 BWS) back in 2008 you tend to expect that papers published in 2016 would not ignore this and would not do research that showed zero awareness of this issue and as a result made fundamental errors – after all, whilst we know clinical trials take a while to go from proposal to main publication, preference studies do NOT take 8+ years to go through this process. I co-ran a BWS study from conceptualisation to results presentation in 6 days when in Sydney. Go figure.

So that’s an example of my biggest frustration – the standards of literature review have often been appalling. Two or three of my papers (ironically including the JHE one, which includes a whopping error which I myself have repeatedly flagged up and which I corrected in my 2008 BMC paper) seem to get inserted as “the obligatory BWS reference to satisfy referees/editors” and in many cases bear no relation to the point being made by authors. Alarm bells immediately flash when I read an abstract via a citation alert and see those were my references. But it keeps happening. Not good practice, folks.

In fact (and at a recent meeting someone with no connection to me said the same thing) in certain areas of patient outcomes research the industry reviews are considered far better than academic ones – they have to be or get laughed out of court.

Anyway, I have been told that good practice eventually drives out bad. Sorry, if that’s true, the timescale was simply too long for me, which didn’t help my career in academia and raised my blood pressure.

Returning to the issue at hand. I’m not going to go through the paper in question, nor the several others that have appeared in the last couple of years purporting to show limitations of BWS. I have a company to run, caring obligations and I’ve written more than enough for anyone to join the dots here if they do a proper literature review. My final attempt to help out was an SSRN paper. But that’s it – without some give and take from the wider community, my most imaginative BWS work will be for clients who put food on the table and who pay – sometimes quite handsomely – for a method that when properly applied shows amazing predictive ability together with insights into how humans make decisions.

Now, of course, health state valuation is another kettle of fish – no revealed preference data etc. However, Tony, Jordan and I discussed why “context” is key in 2008 (JMP); I expounded on this with reference to QALYs in my two 2010 single authored papers, and published a (underpowered) comparison in the 2013 JoCM paper (which I first presented at the 2011 ICMC conference in Leeds, getting constructive criticism from the top choice modellers on Earth). So this issue is not particularly new.

It’s rather poor that nobody has actually used the right design to compare Case 2 BWS with DCEs for health state valuation…I ended up deciding “if you want something done properly you have to do it yourself” and I am very grateful to the EuroQoL Foundation for funding such a study, which I am currently analysing with collaborators. I don’t really “have a dog in this fight” and if Case 2 proves useful then great, and if not then at least I will know exactly why not…and the reasons will have nothing to do with the “BWS is bad m’kayyyyy” papers published recently. (To be fair, I am sometimes limited in what I can access, with no longer having an academic affiliation so full texts are sometimes unavailable, but when there’s NO mention of attribute importance in the abstract, NOR why efficient designs for Case 2 are problematic my Bayesian estimate is 99.99% probability the paper is fundamentally flawed and couldn’t possibly rule BWS in or out as a viable competitor to a DCE.)

If you’d like to know more:

  • Read the book
  • Read all the articles – my google scholar profile is up to date
  • Get up to speed on the issues in discrete choice design theory – fast. Efficient designs are in many many instances extremely good (and I’ve used them) but you need to know exactly why in a Case 2 context they are inappropriate.

If you still don’t understand, get your institution to contract me to run an exec education course. When I’m not working, I’m not earning, full stop.

I’m now far more pragmatic about the pros and cons of academia and really didn’t want to be the archetypal “I’m leaving social media now” whinger. And I’m not leaving. But I am re-prioritising things. Sorry if this sounds harsh/unhelpful – I didn’t want to write this post and hoped to quietly slip beneath the radar, popping up when I thought something insightful based on one of BWS’s REAL disadvantages or Sen’s work etc was mentioned. But people I respect have asked for guidance. So I am giving what I can, given 10 minutes free time I have.

Just trying to end on a positive note – I gave a great exec education course recently. It was a pleasure to engage with people who asked questions that were pertinent to the limitations of BWS and who just wanted to use the right tool for the right job. That’s what I try to do and what we should all aim for. I take my hat off to them all.

Encounter with a GPSI

I recently had a mole removed by a GP with a special interest (GPSI) in dermatology. It was an interesting experience, given that the first ever discrete choice experiment I conducted elicited patient preferences for exactly this type of doctor and specialty.

The study was piggy-backed onto an early (the first?) trial of GPSI care. That trial established equivalence of care with the traditional consultant-led secondary care model (for the large proportion of cases that are routine enough for GPSI care to be appropriate). The DCE, however, showed resistance to GPSI-type care among patients, on average. Now, this was unsurprising: we knew no better and quoted average preferences, which mean nothing usually in DCEs (since you are averaging apples and oranges). Subgroup analyses I did established which patient subgroups were open to GPSI-type care (and when), and those results were all very predictable.

It is the wording we were strongly encouraged to use for the attributes (such as the doctor description etc) that is the subject of this post, particularly in the light of my personal experience of such care “at the sharp end”. We did not use the actual job titles of the doctors: had we done so, we would have given the respondents the choices between “seeing a member of a consultant-led team, which may or may not be the consultant him/herself” versus “seeing a GP who has had (considerable?) special additional training in dermatology”, making it clear that (1) many people don’t see the consultant, contrary to what they believe, and (2) a GPSI is perfectly qualified to deal with their condition and if anything non-routine is found, they are instantly moved to the consultant-led team’s care.

Now, I know why the triallists didn’t like this: patients see “GP” and instantly form (often incorrect) opinions. That was brought home to me when I saw a doctor at the local hospital in Nottingham (actually a private treatment centre subcontracted by the NHS): he never revealed he was a GPSI until we started “talking shop” and suddenly his ID badge was held up in front of me with the exclamation “I was one of the first GPSIs in dermatology appointed!” My referral letter said I would see (consultant) Dr X or a member of his team. Hmmmm. Thankfully I had no preconceptions, and received top notch care – I would certainly see him again if I needed to. (Of course I looked up this GPSI subsequently and it turns out he specialised in surgery first before moving to General Practice to improve conditions for family life, so he was particularly well qualified.) But it did illustrate, albeit anecdotally, that what was really required was a DCE with “labels” (the actual doctor type”) to capture the true patient preferences: that would focus minds on the need for a public education campaign to reduce the stigma associated with GPSIs. What we did, although not misleading in terms of describing the doctors, brushed the underlying problem under the carpet. (So we should have run a labelled DCE – we knew no better then but I am using my own experience to illustrate a serious problem here that continues unabated in health. That’s for another day, however.)

The other attribute I would, with the benefit of being an actual patient, change was location of care. The DCE heavily implied that non-hospital care would be a local general practice. Now, of course, if your general practice doesn’t have the facilities to do minor surgery then this may be grossly misleading. Indeed I had to travel further than the local hospital to get to the GPSI’s surgery for my mole removal. As it happens it didn’t matter: distance as the crow flies was not the important factor in my ability to get there. However, it immediately made me slightly annoyed at the guidance I as the DCE lead received when I did the study. The wording we used was, again, “technically correct” in that the choice was between a place of care that was convenient and local versus not, but I’m fairly sure a non-trivial number of our respondents could have made incorrect assumptions about these attribute levels. I know I did, and I ran the DCE!

It made me a bit (more) cynical about the motives of certain parts of academia: I’d already seen via twitter a much heralded result of a trial I know about that, shall we say, could have been improved upon immensely. Furthermore, I had pause for thought recently when I learnt that some members of industry consider academia-led literature reviews and so-called systematic reviews in certain areas of health to be not worth the paper they’re written on. (I can concur on that regarding recent reviews in my own field). In a time that has seen a huge amount of industry-bashing for selective release of information/publication it really does act as a reminder that some areas of academia need to take a good hard look at their own conduct. Plus, just to be fair, I do shout out about the amazing groups I have worked with or continue to work with. I just feel Ben Goldacre and Danny Dorling were bang on the money in their beliefs (informed by different evidence, which was particularly damning) that bad practice by academia and its associated institutions contributes to the general lack of confidence by the public in the “elites” and how “having your own facts”, whilst of course ludicrous, is a perfectly understandable public reaction to elites that no longer seem to uniformly put the public good first.

As usual I shall make the caveat that there are great groups I work with and this isn’t just “academia bashing”. I just offer constructive criticism based on my own experiences (and mistakes) and give examples of the kind of lack of transparency that cleverer people like Ben and Danny have highlighted as barriers to getting academia more support among the general populace.

BWS correct referencing redux

This is not exactly a moan (since in some cases I’m requesting fewer references to one or two of my own papers, which is all very nice!). It’s just a reminder that BWS has been an evolving technique over many years and I continue to note too many people just seem to add the JHE 2007 paper as “the BWS reference” when it really isn’t supporting what they are doing or saying.

I’m not been afraid to admit when I’ve done something incorrect/misleading, or when the field has moved on and an earlier paper is becoming outdated. (So when I call others on bad referencing, rest assured that I do the same for myself.)

Some points to note:

  • The JHE article was the first comprehensive explanatory Profile Case (Case 2) BWS paper. However, the “marginal models” there involved coding that although gives correct point estimates, give misleading summary statistics like log-likelioods, by not taking account of the sequential nature of the data. Thus, a choice from 5, means only 4 options are available for the second choice.
  • This was corrected ASAP – the 2008 BMC paper on dermatology study corrected this, so marginal sequential models should really reference this paper.
  • References to “dual/multi stage choice tasks” (primarily to get QALYs) should start with my 2010 Pharmacoeconomics paper, since that was the first to propose these (including the DCE+TTO rescaling) method. Too many researchers reference later papers.
  • I was also first in explaining why the “death state” can’t be valued in a DCE without duration and a higher resolution design – in 2008 I wrote about this in Pop Health Metrics, with the God of math psych, Tony Marley, amongst others. I also pointed out why variance scale factors can be highly problematic in DCEs/other choice models. I certainly wasn’t first on the latter point – you should be looking to papers in the 1990s by Swait & Louviere, and Hensher and Louviere for that.
  • First reference to a Case 1 BWS study is in The Patient: Patient-Centered Outcomes Research (2010) by Louviere and Flynn (to my knowledge – I am happy to be corrected if wrong).
  • If you’re comparing Case 2 BWS with DCEs you really should be understanding and discussing how they differ, which was introduced in detail in the 2013 JoCM paper by Flynn et al. Subsequent discussion in the book (2015). DO NOT conclude that either method is “wrong”/”right” purely on basis of comparison of results from each task. Our work explains why they might differ.
  • For Case 3 BWS I’m not the key person, Emily Lancsar was/is big in introducing and applying this in health. Please also note the correct name for this is the “multi-profile case” as agreed by Louviere, Marley and me in preparation for the book. Like the profile case, renaming was done so as to better describe what made Cases 2 and 3 distinct from other Cases.
  • First reference to a peer-reviewed published Case 2 study was from the 1990s by Szeinbach et al; first UK study was 2006 by our team in BJD.
  • Finally, the emerging problems with highly efficient designs: Rose and Bliemer hypothesised this back in 2009; I and team published the first within-subject confirmation in Pharmacoecon 2016.

Thus, it’s just a guide to help practitioners get the correct reference for BWS and associated conceptual issues. Hope it helps. I may add to this if I think of other issues that are incorrectly attributed.

 

Likert and Friday Fun combined

Well well well. I have finally found something that qualifies both as a Friday Fun post and as something relevant to choice modelling – more potential Chinese problems with Likert scales!

The number 4 is known to be considered unlucky in some more traditional parts of Chinese society. However, I never knew that the numbers 7 and particularly 9 can mean something else entirely in Cantonese…..as one of the commenters said, “flaccid” is particularly implied by one of these numbers, so you can already guess the nature of the warning and discussion.

LULZ

So let’s warn you yet again – Likert scales are bad m’kayyyy! Trying to make cross-cultural comparisons in particular and the need to net out any cultural effects is totally unnecessary in most instances now we have Best-Worst Scaling Case 1.

Perfect the enemy of the good?

I recently got into a discussion on twitter about the properties of the ICECAP instruments and what the zero on these mean. One particular point I made was that us “saying” the state of “no capabilities” must be zero didn’t necessarily make it so, at least in the eyes of a mathematical psychologist. They’d probably say it is not a ratio scale and might not even have good interval scale properties at the aggregate level (if there’s improperly-adjusted-for underlying heterogeneity).

I’m not too worried about these points, though I personally think better subgroup/heterogeneity analyses need to be done in future to address the latter point. But it did lead me to think about that old recommendation “don’t let the perfect become the enemy of the good”. This potentially gets a lot of extra-welfarism into hot water, where the maths psych people are concerned: instruments and valuation tweaks or even in some cases the whole valuation method (VAS and arguably TTO) have little in the way of theory (that that group would recognise) behind them. However, I remember one health economist summarising a discussion he had with clinicians and members of the public as to how scarce resources should be allocated in health care and they “naturally” came up with something that approximated a QALY with TTO scoring. This is fair enough and I am happy with the newer theories/concepts put forward to justify what health economists do in that particular area. After all, extra-welfarism doesn’t have the same assumptions and theories as traditional welfarist economics so why get bothered about what another discipline entirely thinks?

I guess I’m just naturally – having worked so long with a maths psych guru – very particular about getting scoring “right”, as in it satisfying one or more of the properties inherent in proper scales (absolute/ratio/interval/difference). So yeah I guess I may be guilty of being dissatisfied with just “good”…but in my defence, we are producing tariffs (sets of scores) here that are being increasingly used across the world – the Netherlands has already decided on a dual “QALY+” approach: more than one evaluative space seems to finally be accepted, to aid decision-making. We shouldn’t stand still, particularly as we know a lot more about the properties of the Best-Worst Scaling valuation technique now than we did back in the original UK ICECAP-O valuation exercise. Whilst it is gratifying that public interest and funders have agreed with us that areas like end-of-life care and (potentially) children need ICECAP instruments, we should not rest on our laurels with existing instruments.