Browse
 
Tools
Rss Categories

De-identification Practices

11 What is a quasi-identifier?

As noted in a different KnowledgeBase article (view here), the primary type of disclosure risk that needs to be focused on is identity disclosure. An underlying assumption for this type of risk is that there is an intruder who has two pieces of information:…

12 What are the different types of disclosure risk?

There are two general kinds of re-identification risk that are of concern. The first is when an intruder can assign an identity to any record in the disclosed database. For example, the intruder would be able to determine that record number 7 in the disclosed…

13 Who cares about my medical records?

One question that is sometimes posed is "why would anyone want to re-identify my records?" The argument goes that if the medical records have no value to someone else, then why would anyone bother getting access to and re-identifying them? Below are the reasons…

14 The difference between consenters and non-consenters

We have just completed two large systematic reviews looking at the difference between consenters and non-consenters. The review considered clinical trials and observational studies with primary data collection and secondary use of existing databases. For…

15 Can a voter list be used for re-identification ?

A lot of literature makes the point that voter lists can be used for re-identification. However, the accuracy of this statement will depend on your jurisdiction. In the US many states make their voter lists available for free or for a small fee. Often there…

16 What quasi-identifiers should I use for managing journalist risk?

With journalist risk the intruder is not looking for a specific person in the disclosed data set; re-identifying any person will achieve the goal. A classic example is the reporter who is going through a leaked medical database to find someone with a sensitive…

17 Is sampling sufficient to de-identify a data set?

Sampling means drawing a subset of the rows from the data set and disclosing those instead of the complete data set. The reason why sampling is sometimes used is because it thwarts a prosecutor type attack by making it difficult for an intruder to know if…

18 How can I de-identify longitudinal records?

At the outset, it is important to make a distinction between three types of longitudinal records that occur often in practice. The first type consists of specific variables that are collected from all patients at specific points in time. For example, if function…

19 Can a person be re-identified from their diagnosis code?

In many discussions about re-identification risk and de-identification the focus is on demographic variables. But many data sets also include diagnosis codes (for example, ICD-10 codes). We will answer the question on whether these can be used for re-identification…

20 Can individuals be re-identified from disease maps?

Increasingly, public health units, the media, and researchers are publishing or posting maps on the web showing locations of individuals with particular diseases. Do these maps represent a high re-identification risk? There have been studies showing that…