The Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule provides standards for the use and disclosure of "individually identifiable health information," dubbed protected health information, or PHI. PHI is information, including demographic information, that relates to an individual's physical or mental health, the provision of health care to the individual, or payment for the provision of health care to the individual. Such information constitutes PHI if it identifies the individual or if there is a reasonable basis to believe it can be used to identify the individual to whom the health information pertains. Thus, PHI includes many everyday identifiers (i.e. name, address, birth date, social security number)that can be associated with an individual's health information.
With the rapid advancement in information analytic technologies, the ability to combine large, complex data sets from various sources into a powerful tool for advancement in health care protocols is accelerating. These same analytic technologies, however, enhance the ability to use publicly available demographic information to associate an individual's health information with that individual. In order to balance the potential utility of health information, even when it is not individually identifiable, against the risk that the subject of the information might be identified, the Privacy Rule provides two methods of de-identification: (1) determination by a qualified expert; and (2) removal of specified identifiers.
THE DE-IDENTIFICATION STANDARDS
The HIPAA Privacy Rule provides a safe harbor for de-identification that requires the complete removal of each of 18 types of identifiers. However, the removal of these identifiers, such as birth date, dates of admission and discharge, death, and indications of age over 89, may render the data set less useful as a research tool. To provide some flexibility, the Rule allows use of other de-identification strategies where an expert determines "that the risk is very small that the information could be used, alone or in combination with other reasonably available information" to identify the subject of the information. § 164.514(b). Beyond this rather limited language, no guidance is provided regarding when it should be applied to real-life circumstances. As a result, the Office of Civil Rights of the United States Department of Health and Human Services (OCR), the entity charged with enforcement of the Privacy Rule, has recently issued guidance regarding methods for de-identifying PHI that would satisfy the Rule. See Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.
THE OCR GUIDANCE
Beginning with the level of expertise needed by an "expert" in order to provide a de-identification opinion under HIPAA, the Rule requires "[a] person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable . . ." § 164.514(b) The guidance confirms that no specific professional degree or certification exists for such expertise. Relevant expertise may be obtained through many types of experience in the scientific, mathematical, statistical or other arenas. Whether a particular individual is such an expert under the Privacy Rule is a judgment to be exercised based upon the relevant professional experience and training of the individual.
Likewise, there is no specific statistically determined numerical level of identified risk that will constitute a "very small" risk that the subject individual may be identified. The guidance indicates that this risk is dependent upon many factors to be taken into account. The assessed risk of identification for a particular data set in the context of one environment may not be the same for that data set in a different environment, or for a different data set in the same environment. Similarly, since technology and the availability of information are rapidly changing, the level of risk for even the same data set in the same environment may change over time. Thus, the guidance confirms that no specific process must be used to reach a conclusion that the risk of identification is small.
What the expert must do is: (1) evaluate the extent to which the health information itself is identifiable; (2) provide advice on the methods that can be applied to mitigate the risk; (3) consider what data sources may be available (such as voter registration records) for use in identification; and (4) confirm that the identification risk of the resulting product is no more than very small. This analysis will include such factors as the degree to which the data set can be matched to a data source that reveals the identity of the individuals, such as matching a birthdate and zip code combination in a health record to a birth date and zip code combination in a voter registration record. The accessibility of the alternative data sources should also be considered by the expert. For example, the existence in the data set of patient demographics is high risk, as they potentially can be matched with data that appears in public records, while clinical features and event related time frames pose a much lower risk. Certain combinations of values may, for similar reasons, increase the risk of identification.
Consistent with this theme, the Privacy Rule also does not prescribe any particular approach to mitigation of any risk of identification that does exist. The expert should choose from various measures, such as suppression of an entire category of data, supression of some individual records, or generalization of a particular measure into a band of values, when necessary to reduce the risk of identification. The OCR also suggests limiting redisclosure of the data set through agreement as a mitigation methodology.
In sum, the OCR has made clear, through the publication of the de-identification guidance, that the Privacy Rule's intention is to provide maximum flexibility in the design of data sets containing Protected Health Information, so long as the ultimate goal, limiting the risk that an individual's PHI will be discovered to a "very small" possibility, is served.