Can genomic data really be anonymized?

The restrictions placed on data use by legislation such as the European Unions’ General Data Protection Regulation (GDPR) and South Africa’s Protection of Personal Information Act (POPIA) do not apply to anonymized data. This has several implications on the use of data for research purposes, including future use of data for research for which it was not originally collected. However, can genomic data really be anonymized? In other words, can genomic data be irreversibly de-identified?

Any close reading of recital 26’s identifiable natural person definition, including Article 29 Working Opinion 5/2014, sets a zero risk standard. There’s some controversy regarding the standard.
Khaled El Emam suggests that standard is impractical (https://doi.org/10.1093/idpl/ipu033) and ICO stated a different “reasonable degree of confidence standard”. A more loose interpretation would broaden when genomic data wouldn’t be personal data under GDPR.

Given sufficient information about a person, whether genotype or phenotype, there is a risk of reidentification, if not now then later. To achieve anonymisation, broad utility would most likely have to be obliterated. As a practical matter, an institution working with genomic information that takes a “close reading stance” and desiring broad utility from the data would best assume that genomic data couldn’t be anonymised in my opinion. There are scenarios where very few attributes/segments are shared from large n-sizes in a broad population where there is likely zero risk and the data still has utility for narrow use cases.

1 Like