Identification and genomic data

By Alison Hall

13 February 2018


Effective data sharing underpins much of health care, and new ways of integrating and mining data support existing practice and inform novel approaches. The regulatory framework that governs the processing of data therefore impacts upon all aspects of health care.

A key principle in data regulation is that the most identifying data warrants the most stringent protection: this is on the basis that personal identifiable data has the most intrinsic value but is also the most likely to be used to harm, stigmatise and discriminate against individuals. The main scope of the current Data Protection Act (and future General Data Protection Regulation) therefore, is personal identifiable data. The common law also recognises certain categories of information (such as those shared between patient and doctor) as confidential and meriting special protection. Conversely, data which is neither identifiable nor confidential and therefore falling outside this regulation can be used more freely.

It is fundamentally misguided to regard all genetic/genomic data as being inherently identifying - since the identifiability of genetic or genomic data is heavily dependent on context.

Since so much rests on this distinction between identifiable data and de-identified data, defining where the boundary lies and the circumstances in which data may move from one category to another, is hotly debated - nowhere more so than in the context of de-identification of genetic and genomic data, where the debate has been dominated by how new technologies might promote or undermine existing processes and approaches.

Publication of a new PHG Foundation report

This PHG Foundation report Identification and genomic data attempts to demystify the debate which erroneously regards all genetic and genomic data as identifying, and all too often reduces data protection to claim and counterclaim about competing technological advances.  In the report we make a series of recommendations, summarised into three complementary objectives:

  1. Increasing transparency about the challenges and limitations of anonymisation: anonymisation is often not absolute and anonymisation techniques alone may not be foolproof and provide lasting protection
  2. Recognition of the dynamic nature of data processing, protection and governance: data uses are dynamic and proportionate governance should not be incompatible with optimising new opportunities from data integration and data science
  3. Multidisciplinary approaches: the tendency for siloed thinking perpetuates misconceptions around the utility, security and governance of data. Empowering multidisciplinary forums such as the Council of Data Ethics is the best route to optimising balanced and robust policy development.

New sanctions for wilful or reckless re-identification

These three objectives can best be realised through a proportionate regime that also strengthens the sanctions for wilful actions likely to result in the most harm. The Data Protection Bill recently introduced into the House of Commons, aims to achieve this by creating a number of offences including section 171 -

for a person knowingly or recklessly to re-identify information that is de-identified personal data without the consent of the controller responsible for de-identifying the person data’ [Section 171 UK Data Protection Bill]

The inclusion of this provision in the Data Protection Bill signifies the importance being placed on unauthorised de-identification, and together with other sanctions in the General Data Protection Regulation, forms a more comprehensive package of measures that will better protect data subjects from their data being exploited.

However it is important that these measures do not prevent justified de-identification from taking place – such as where incidental findings of potential clinical relevance are generated through clinical care and medical research. Since one of the defences to this offence is that the re-identification is justified as being in the public interest, it is important that what constitutes ‘public interest’ is construed broadly enough to allow these activities to continue to take place, such as where the data controller may not have direct contact with the researchers who generate a potentially relevant incidental finding.

Context is key but public trust is easily squandered

The key message to policy makers from this report is that it is fundamentally misguided to regard all genetic/genomic data as being inherently identifying - since the identifiability of genetic or genomic data is heavily dependent on context. It is similarly misguided to imagine that anonymisation techniques offer a robust long-lasting technical fix that withstands all potential threats from new approaches to data integration and constantly evolving technologies.

Instead what is needed is a new mindset for all stakeholders involved in generating, using and sharing data that although data is ubiquitous – data misuse potentially squanders the public trust on which responsible data use relies.

Read the report Identification and genomic data.

Genomics and policy news

Sign up