A Critical Examination of Pre-and-Post HIPAA Re-identification Risks

Date:  July 18th 2012 from Noon to 1pm EST

Abstract: The pre-HIPAA (1997) re-identification of Massachusetts Governor William Weld’s medical data within an insurance dataset which had been stripped of direct identifiers has had a profound impact on the development of de-identification provisions within the 2003 HIPAA PrivacyRule. Weld’s re-identification, purportedly achieved through the use of a voter registration list from Cambridge, MA is frequently cited as an example that computer scientists can re-identify individuals within de-identified data with “astonishing ease”. However, a careful re-examination of the population demographics in Cambridge indicates that Weld was most likely re-identifiable only because he was a public figure who experienced a highly publicized hospitalization rather than there being any real certainty underlying his re-identification using the Cambridge voter data – which had missing data for a large proportion of the population.

The complete story of Weld’s re-identification exposes an important systemic barrier to accurate re-identification known as “the myth of the perfect population register”. Because the logic underlying re-identification dependscritically on being able to demonstrate that a person within health data set is the only person in the larger population who has a set of combined characteristics(known as “quasi-identifiers”) that could potentially re-identify them, most re-identification attempts face a strong challenge in being able to create a complete and accurate population register. This strong limitation not only underlies the entire set of famous Cambridge re-identification results but also impacts much of the existing re-identification research cited by those making claims of easy re-identification.

This webinar critically examines the historic Weld/Cambridge re-identification attack and the post-HIPAA dramatic reductions (thousands fold) of re-identification risks for de-identified health data as they have been protected by the HIPAA Privacy Ruleprovisions for de-identification since 2003.

We will also discuss:

  • Recommendations for enhancements to existing HIPAA de-identification policy,
  • Critical advances in medical science and improvement of our healthcare system that are routinely accomplished using de-identified data, and
  • The vital importance of properly balancing the competing goals of protecting patient privacy and preserving the accuracy of scientific research and statistical analyses conducted with de-identified data.

Webinar attendees will be informed about issues related to the validation and certainty of re-identification attacks, the impact of imperfect population registers on real world re-identification risks, the key aspects of the HIPAA de-identification provisions that have resulted in successful privacy protects an recommended best practices for further assuring protection of de-identified data.

Background reading for this webinar is the following report: ‘The Re-Identification’ of Governor WIlliam Weld’s Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now’, located at the Social Science Research Network.

Speaker: Dr. Daniel Barth-Jones

Dr. Daniel C. Barth-Jones is an Assistant Professor of Clinical Epidemiology at the Mailman School of Public Health at Columbia University in New York and an Adjunct Assistant Professor and Epidemiologist at the Wayne State University School of Medicine in Detroit, Michigan. Dr. Barth-Jones received both his Master of Public Health and Ph.D. degree in Epidemiology from the University of Michigan. Dr. Barth-Jones’ work on statistical disclosure science has focused the importance of properly balancing two vital public policy goals: effectively protecting individual’s privacy and preserving the scientific accuracy of statistical and geo-statistical analyses conducted with de-identified health data. He has authored several peer-reviewed publications and a book chapter on statistical disclosure assessment and control. His interests include statistical disclosure analyses/control methods for statistical de-identification of healthcare data, and geospatial and statistical modeling in epidemiology. He also maintains a research agenda in the areas of theoretical population vaccinology, infectious disease epidemic modeling and simulation, and health economic evaluations of public health policies for vaccination and preventative intervention programs.

Download the article here.

Download the slides here.