Browse
 
Tools
Rss Categories

De-identification Practices

1 What de-identification software tools are there ? Featured

There are five de-identification tools that are generally available. These tools work on structured data. There are other tools that focus specifically on free-form text, but these are not covered here. Also, it is important to make a distinction between…

2 Are there any de-identification standards ? Featured

One question that often comes up is whether there are already de-identification guidelines available today. This is important because existing statutes and regulations do not provide very precise descriptions of what needs to be done to de-identify data,…

3 What is the re-identification risk from small simple counts of disease cases?

A custodian has been asked to release counts of people with a particular disease. For example, in the year 2008 4 people had that particular disease in Ontario. Since the count is less than five, is there a re-identification risk in disclosing this information?…

4 Should we de-identify if technology is moving so fast?

It is sometimes stated that re-identification technology is moving forward all the time, and that new databases useful for linking are being made available all of the time, and therefore that it is futile to de-identify any data sets. There are two counterarguments…

5 Definition of identifiable dataset - if a person can find their record(s) in the dataset

One question that sometimes comes up is whether a data set can be considered identifiable if a person can find their own record(s) in there. This definition can be analyzed from a number of different perspectives. A person may not know if they are in a data…

6 What is the difference between prosecutor and journalist risk?

Disclosure risk can be characterized as prosecutor risk or journalist risk (see http://www.jamia.org/cgi/content/abstract/15/5/627). These are just colorful names for two common types of risks. They are similar in that they both pertain to the risk of an…

7 Which type of threshold should we use for de-identification?

Many types of thresholds have been suggested and used for deciding when a data set is de-identified. Some common ones are: Cell size of 5, 3, or 10 Uniqueness Rareness A question that comes up in practice is "which threshold should we use?". In fact, all…

8 Is there a secondary use market for health information?

An issue that has occasionally come up is whether there is a secondary use market for health information? Of course secondary use has been occurring for many years in the context of research, quality improvement, and public health. But does the data have…

9 What are the quasi-identifiers that I should use for managing prosecutor risk?

If you are trying to manage prosecutor risk, then you assume that the intruder has a specific target person in mind and is trying to re-identify that person's records in the disclosed data set. The intruder is also able to get some background information…

10 Are Canadians identifiable by their age, gender, and residence forward sortation area ?

For many studies the combination of age, gender, and residence Forward Sortation Area (FSA) are collected. Also, in many datasets that are disclosed these three variables are included. Does that represent a privacy risk? In one of our studies we analyzed…

1 2 3 4 Next