Pseudonymization, Anonymization & GDPR
Photo by Jamie Street on Unsplash

Pseudonymization, Anonymization & GDPR

The General Data Protection Regulation (GDPR) is a set of laws designed to protect individuals within the European Union (EU). Specifically, it gives individuals more control over their personal data and how it is being used. It provides a uniform data security framework for all EU members, so that each member state no longer needs to create its own data protection laws. Now, companies have a legal incentive to protect and keep private any data they collect from their users. 

Recital 26 of the GDPR states: 

“The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”

Pseudonymization and anonymization are methods that are highly recommended by the GDPR regulation because they reduce risk and assist “data processors” in complying with their data protection obligations. These techniques can be used to protect individuals by masking data (such as a name or date of birth or address) that would enable someone to link the data to them. To gain a high level understanding of how the methods affect personal data, think of your data as a range with completely unprotected, fully identifiable personal data on one end and anonymized data with zero identifiable information in it and pseudonymized information somewhere in between. The Future of Privacy Forum put together a nice visual guide to illustrate the concepts:

https://fpf.org/2016/04/25/a-visual-guide-to-practical-data-de-identification/

What is anonymization?

Anonymization is the permanent removal of any information that may serve as an identifier. Once a data set has been anonymized, it is impossible to identify individuals from it. Anonymizing data allows organizations to use the data for marketing and research, while protecting individuals from data exposure. However, since true anonymization is difficult to achieve, most businesses choose to use pseudonymization techniques.

In 2014, the Article 29 Working Party (WP) issued Opinion 05/2014 on Anonymisation Techniques, in which they analyze the effectiveness of various techniques, including:

  1. Noise Addition – adding a level of imprecision to the original data. For example, a patient’s weight might show a range of +/- 10 lbs., rather than a precise number.
  2. Substitution/Permutation – replacing information with other values. For example a patient’s height of 5’11” might be stored as “blue.”
  3. Differential Privacy – the idea of converting individual user data into something unidentifiable by bundling and blurring it in one way or another 
  4. Aggregation/K-Anonymity – a “hiding in the crowd” concept where if each individual is part of a larger group, then any of the records in the group could correspond to a single person. For example a data set might contain information about people in the North West instead of specifying a specific town, like Seattle, WA.

When done properly, anonymization can place data outside the scope of the GDPR.

What is pseudonymization?

Pseudonymization involves replacing actual data with pseudonyms. Article 4(5) of the GDPR defines pseudonymization as:

 “…the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.”  

Like anonymization, there are various techniques to pseudonymize data, including:

  1. Scrambling – the mixing of obfuscation of letters. For example the word “obfuscate” could become “tocbusafe”.
  2. Encryption – the process of converting the information into a code that is unintelligible. In most cases, encrypted data can be decrypted by use of an encryption key.
  3. Tokenization – replacing sensitive parts of the data with non-sensitive placeholder values. For example, a credit card number “4111 111 111 1234” can become “4281 **** **** 2819".
  4. Data blurring – using an approximation of values to render the data meaningless. Think of a portrait with a blurred face – you know there is a person represented there, but you cannot identify who it is.

Data handlers store personal data separate from “additional data” that serves to link the two together. By utilizing pseudonymization techniques, data controllers can still benefit from the data’s utility, while protecting individuals’ rights.

That said, it is important to note that pseudonymized data falls under the scope of GDPR. According to Article 29 Working Party Opinion 05/2014 on Anonymisation Techniques (1):

“Pseudonymised data cannot be equated to anonymised information as they continue to allow an individual data subject to be singled out and linkable across different data sets”.

In summary

The effectiveness of both anonymization and pseudonymization depend on the business case and individual circumstances. Both techniques are recommended by the GDPR to enable compliance with the laws designed to protect personal data.

It is interesting to note that while the citizens of the EU have this legal protection of their personal data, the US has yet to address the issue. Personal data protection is very much on the minds of business and government leaders as well as citizens in this country.

Nice article and great infographic. From a programmatic standpoint, I've always thought it was fairly straight-forward to scrub, anonymize, redact, and de-identify. The big trick is to make that data meaningful as aggregate cohorts (in health care) and as well as being useful in non-production environments, i.e. it's not helpful to change my name to "ASDF". I found it particularly challenging passing the test to avoid the likelihood of data RE-IDENTIFICATION (HIPAA rule) https://en.wikipedia.org/wiki/Data_Re-Identification 

To view or add a comment, sign in

More articles by Brad Perry

  • Finding Purpose

    Imagine getting paid for doing something that connects to your sense of purpose. You would leave work each day filled…

    1 Comment
  • Re-wire your brain for more happiness

    Can you re-wire your brain to experience more happiness? Yes you can! People who struggle with low self esteem or…

    5 Comments
  • What Is Leadership?

    There are many opinions about what leadership is. It means different things to different people in different contexts.

    3 Comments
  • What is DeNIST’ing and Why Should You Care?

    DeNIST is a made up word that you won’t find it in a dictionary. Chances are you have probably heard the word, but may…

  • The Basic Steps Of eDiscovery

    This is the 2nd article in an eDiscovery primer series, in which I talk about eDiscovery in a simple and easy to…

  • What is eDiscovery? And why should you care?

    What is eDiscovery? The term "eDiscovery" is short for "electronic discovery." As you probably know, the process of…

    1 Comment
  • FOUR STEPS TO ENSURE THAT WHAT IS PRIVATE, REMAINS PRIVATE

    I recently read about the new California Consumer Privacy Act and thought it sounded a lot like the EU GDPR (General…

  • Considering Time Zones when Processing ESI

    Just this week, we had a client request to process data in a specific time zone. While there are situations where this…

Others also viewed

Explore content categories