We, I mean members of the MRS Market Research Standards Board, spent a lot of early 2014 finalising the revised Code of Conduct, published by the MRS this autumn. In particular, we spent a lot of time debating the issue of anonymity, which features far more in the latest version than before.
The Information Commissioner (ICO) has produced a comprehensive guideline to best practice in anonymisation, which I've referred to in the past. These are just two examples of good advice to encourage best practice processes and policy.
However, as Dan Nunan (Henley Business School) reminded delegates at the recent Social Research Association annual conference (December 8th), preserving the promise of anonymisation in research is becoming increasingly difficult. Sometimes it's simply because insufficient thought is given to the consequences of making seemingly anonymous data available to anyone to access. Nunan's interesting example came from the famous yellow taxis, familiar to anyone who has visited New York. Surprisingly, I thought, it appears that a full database of journey patterns is available. A bit like Oyster card data, seemingly showing a stream of a-b journeys, with of course no data identifying the passenger, the driver or their cab.
However, unlike Oyster data, taxi usage data is more personal – a bit like a full postcode, it may not identify individual passengers, but there are likely to be no more than 3 or 4 for any one journey, and a high probability of only one. But, someone then discovered that it was possible from the data to work out the identity of each cab in the database. So, what happens if you then match these supposed anonymous travel occasions to 'photos in the press etc showing celebs entering or leaving a yellow cab, at a specific location, where the taxi identify can be also be seen in the 'photo….? Bingo…, especially as in addition to identifying journey patterns, the database includes details of the $ fare and the value (or not) of any tip proffered by the passenger! How mean, or generous are celebs in tipping drivers; what might be the origin of their journey on each occasion?
The point is, new data not in the public domain was created from data that was. The whole database is huge, cataloguing 173m rides in 50 gigabytes of data. Since the story broke, issues of data quality have emerged, and how tip information is recorded – underlying the need to know the limitations, or quality issues within data sets that may lead to incorrect assumptions being made. But when have all the facts got in the way of a good story? For the full story click here.
Nunan argued that the root of the problem is that humans are notoriously bad at managing risk. The storm described as the biggest since records began, doesn't mean there won't be a similar one next week – just ask the people living on the Somerset Levels in the UK about flood risk management in the UK, although it would appear that the Philippian authorities applied the lessons from the previous typhoon when faced with a recent similar experience. Could you expect the New York taxi authority to have thought through all the potential scenarios that might occur when creative analysts got their hands on the data?
But, as researchers in the era of 'big data' (do I hear a groan), it is becoming increasingly difficult to know just where survey data will end up and what it might be matched with that then enables the promise of anonymity to be breached. Only the day after the conference, I was in conversation with someone with a query about the advisability of supplying survey data plus a full postcode to clients. Not really a good idea
Nunan argued that anonymisation is viewed as the escape clause that takes data beyond the reach of data privacy legislation. On the face of it this is true, but researchers need to think carefully about whether anonymity can be retained – where is the data going; how will it be used; is there a likely risk that anonymity might be breached; have clients been advised about best practice and the impact of any promise of confidentiality made to participants when their data was collected.
Nunan contended that the increasing levels of refusals/non response just might be linked to fears about just how good any promise really is in the real world, and his presentation came hot on the heels of the latest GRBN report indicating that 31% of participants in this global survey trusted market research industry 'very little' to protect their data).
As Nunan stated, if anonymisation fails, trust is lost, but the perception seems to be that we are not trusted anyway by many citizens, whatever our track record might be.
So, come on, be honest, how much did you tip the driver on your last visit to the Big Apple? We have ways of finding out…
This post was first published on the International Journal of Market Research website.