Reports of HMRC’s plans in the Guardian come at a bad time for HMRC, given that the work of opening up NHS data to researchers and companies through Care.data has stalled so badly. The timetable was rushed and NHS England severely underestimated public anger about being automatically opted in, resulting in extraction of records to the Care.data database being postponed until the autumn.
Not only privacy campaigners but also legal experts and MPs on all sides have criticised both initiatives for the potential to compromise people’s safety and confidentiality.
There’s also potential for using the data to do social good, however. With medical data, that could include assessing the effectiveness of local healthcare provision or investigating the influence of demographics and geography on the prevalence of disease. With tax records, we could achieve a better analysis of employment, under-employment and self-employment rates than is currently possible, and take steps to improve equality and opportunity when it comes to income distribution and effective tax rates.
These benefits and drawbacks must – and can – be balanced but almost certainly not by creating massive databases of people’s sensitive data, then selling them to third parties, whether that be in anonymised, pseudonymised or identifiable form. There is too much scope for it to be lost, and for data protection laws to be bent and broken. It’s no use denying now that bad practice is institutionalised in some areas of the corporate world and not just restricted to rogue operators – there have been too many examples of it to ignore.
Any database that contains complete individual records should be impenetrable to anyone who isn’t a professional in a medical setting or tax office dealing directly with that individual’s case. And that doesn’t make it commercially or statistically useless, it just means that any analysis done outside that scope ought to be strictly controlled with Government oversight, so the only data that actually leaves NHS or HMRC premises is aggregated trend data.
Anonymising data (i.e. removing all identifiable information such as names, National Insurance or NHS numbers and addresses) or pseudonymising it (by replacing these data fields with artificial substitutes) will not be good enough for this kind of database. As former Microsoft chief privacy adviser Caspar Bowden has pointed out on Twitter today, the Government’s consultation on releasing public data late last year seemed not even to recognise the legal difference between the two. At any rate, if it’s detailed enough then such data can be re-identified by anyone determined enough to do it, as proven by the New York Times’ work on AOL search engine data released in 2006.
There is plenty of public benefit to be had by analysing national and regional trends in partnership with companies and universities, and making the findings publicly or commercially available but renting out citizens’ most confidential information comes with too many dangers attached.