Browse
 
Tools
Rss Categories

Can individuals be re-identified from disease maps?

Views: 1389 Created: 26-10-2009 19:00 Last Updated: 23-09-2011 15:04

Increasingly, public health units, the media, and researchers are publishing or posting maps on the web showing locations of individuals with particular diseases. Do these maps represent a high re-identification risk?

There have been studies showing that published maps which contain point locations of individuals or households with a particular disease can be reversed engineered to determine the original location, even if the published map is low resolution and certain landmark and geographical features are removed. Therefore, as a starting point, the risk of re-identification would be high if individual points are published. One can perturb these published points rather than publish the original points, but we'll leave perturbation techniques for another article.

In some cases prevalence rates for a particular area are published. The following are good examples of disease maps published by The Toronto Star for various sexually transmitted diseases. The rates are published per FSA:

 

Do these maps risk identifying any of the individuals? There are three questions that need to be answered to determine the risk:

  • Is the disease visible?
  • Is the disease rare in the geography?
  • If I re-identify an individual, will I learn something new about them?


If the disease is not visible then there is really little risk because there are no plausible scenarios for going from a prevalence rate for an FSA to an individual. If we consider infectious syphilis, the first sign usually appears 2 to 10 weeks following exposure, and a red, oval sore, called a chancre, develops at the site where the bacteria entered the body. These could appear on the mouth, hands, and most likely on the genitals. However, if they appear on the mouth or hands then an argument can be made that it is visible. For HIV, facial muscle wasting would be quite visible as well (see the pictures here).

Rareness of a disease can be defined as a prevalence rate of less than 10 in 10,000. Therefore, for any FSA in those maps where the prevalence is greater than 1 (in 1000), we would consider that not being rare (e.g., the FSA "M4Y" has a prevalence rate higher than 1 for infectious syphilis). In general, however, we can see that within the FSAs, most of the diseases are quite rare by that definition.

This definition of rareness is different from the one used by statistical agencies. For example, often if the prevalence is less than 0.5% then statistical agencies consider that rare and suppress those records or apply some other disclosure control actions. This is the usual procedure for high age values that are top coded at 90 years old. In Canada about 0.5% of the population is older than 90.

For our purposes, we will use the 10 in 10,000 definition, which is more conservative.

Therefore, if we take say HIV, it would be rare and visible in the FSA "M2L". Now, let us consider a re-identification scenario. An intruder would go to "M2L" with a picture of facial wasting and ask the people living there if they know or have seen someone who matches these characteristics. A neighbor could then say that Bob looks like that. In that case the neighbor would learn that Bob has HIV, and the reporter would find a person with HIV.

The local newspaper can publish an article on facial wasting or the neighbor may read an article on facial wasting in a book and realize that Bob probably has HIV. One can then argue that the neighbor learned something new from generally available information, and that helped him recognize a visible condition that Bob has. In that case the published map information has no bearing on the neighbor's recognition that Bob has facial wasting.

The reporter can also walk down the street until he finds Bob (someone with facial wasting) and have no neighbor involved. In this case, the existence of the prevalence map indicated that there is at least one person in that FSA. Therefore, if the map was binary (zero and greater than zero), that would be all the prevalence information needed to encourage the identification of Bob. The reason is that the reporter would not bother going into "M2L" if the prevalence was known to be zero. However, the map created an incentive but did not provide a link to Bob's identity.

On the other hand, if the reporter has another database which has, say, financial information on people with HIV living in "M2L", then figuring out which record belongs to Bob means that the reporter will learn something new about Bob. The geographic specificity, the rareness and the visibility of the disease make it easier to correctly link Bob to his record in that database.

Therefore, whether something new can be learned depends on whether there exists a database that has geographic specificity on individuals with that rare and visible disease, and this database contains additional sensitive information about those individuals. If these conditions are met, then a stronger case can be made that the disease maps can reveal something new about the affected individuals.

Another scenario is if the prevalence rate is high. Say if 900 in 1000 people in a particular area have a condition. This would mean that there is a very high probability that everyone who lives in that area has the condition. Anyone looking at the map would learn something new and personal about people living in that area. The people living in that area would not be re-identified per se, but sensitive information about them would be disclosed through the map.

To summarize then, here are the two scenarios where a disclosure risk is plausible and potentially high:

  • If the disease/condition is rare (low prevalence), it is visible, and there is a database with other sensitive information in it.
  • If the disease/condition is very common (very high prevalence) that almost everyone who lives in that area has it.



The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.