One well-recognized way to protect patient privacy is to de-identify health data.  However, trends around increases in publicly-available personal data, data linking and aggregation, big data analytics, and computing power are challenging traditional de-identification models.  While traditional de-identification techniques may mitigate privacy risk, the possibility remains that such data may be coupled with other information to reveal the identity of the individual.

Last month, a JAMA article demonstrated that an artificial intelligence algorithm could re-identify de-identified data stripped of identifiable demographic and health information. In the demonstration, an algorithm was utilized to identify individuals by pairing daily patterns in physical mobility data with corresponding demographic data. This study revealed that re-identification risks can arise when a de-identified dataset is paired with a complementary resource.

In light of this seeming erosion of anonymity, entities creating, using and sharing de-identified data should ensure that they (1) employ compliant and defensible de-identification techniques and data governance principles and (2) implement data sharing and use agreements to govern how recipients use and safeguard such de-identified data.

De-identification Techniques and Data Governance

The HIPAA Privacy Rule (45 C.F.R. §164.502(d)) permits a covered entity or its business associate to create information that is not individually identifiable by following the de-identification standard and implementation specifications (45 C.F.R. §164.514(a)-(b)).

In 2012, the Office for Civil Rights (OCR) provided guidance  on the de-identification standards. Specifically, OCR provided granular and contextual technical assistance regarding (i) utilizing a formal determination by a qualified expert (the “Expert Determination” method); or (ii) removing specified individual identifiers in the absence of actual knowledge by the covered entity that the remaining information could be used alone or in combination with other information to identify the individual (the “Safe Harbor” method).

As publicly-available datasets expand and technology advances, ensuring the Safe Harbor method sufficiently mitigates re-identification risk becomes more difficult.  This is due to the fact that more data and computing power arguably increase the risk that de-identified information could be used alone or in combination with other information to identify an individual who is a subject of the information.

Given the apparent practical defects in the “Safe Harbor” method, many organizations are applying a more risk-based approach to de-identification through the use of the “Expert Determination” method.  This method explicitly recognizes that risk of re-identification may never be completely removed. Under this method, data is deemed de-identified if after applying various deletion or obfuscation techniques the “risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information . . . .”

In light of the residual risks associated with de-identified data generally, it is important that organizations continue to apply good data governance principles when using and disclosing such data.  These best practices should include: data minimization, storage limitation, and data security.  Organizations should also proceed with caution when linking data sets together in a manner that could compromise the integrity of the techniques used to originally de-identify the data.

Data Sharing and Use Agreements

Regardless of the de-identification approach, the lingering risk of re-identification can be further managed through contracts with third parties who receive such data.  Though not required by the Privacy Rule, an entity providing de-identified data to another party should enter into a data sharing and use agreement with the recipient.  Such agreements may include obligations to secure the data, prohibit re-identification of the data, place limitations on linking data sets, and contractually bind the recipient to pass on similar requirements to any downstream other party with whom the data is subsequently shared. Further, such agreements may include provisions prohibiting recipients from attempting to contact individuals who provided data in the set and may also include audit rights to ensure compliance.

The risk of re-identification may be a tradeoff to realize the vast benefits that sharing anonymized health data provides; however, entities creating, using and sharing de-identified data should doing so responsibly and defensibly.


Alaap B. Shah


Elizabeth Scarola

Back to Health Law Advisor Blog

Search This Blog

Blog Editors

Authors

Related Services

Topics

Archives

Jump to Page

Subscribe

Sign up to receive an email notification when new Health Law Advisor posts are published:

Privacy Preference Center

When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.

Strictly Necessary Cookies

These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information.

Performance Cookies

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance.