Browse
 
Tools
Rss Categories

Are Canadians identifiable by their age, gender, and residence forward sortation area ?

Views: 1609 Created: 16-10-2009 19:00 Last Updated: 23-09-2011 14:59

For many studies the combination of age, gender, and residence Forward Sortation Area (FSA) are collected. Also, in many datasets that are disclosed these three variables are included. Does that represent a privacy risk?

In one of our studies we analyzed the Canadian census (2001), and one of the questions that we attempted to answer was this one. The study is available from here: http://www.jamia.org/cgi/content/abstract/16/2/256

Our conclusion was that only a small proportion of Canadians are unique on these three variables (we use uniqueness as a measure of re-identification risk; for a discussion of this issue see this KnowledgeBase post: [view here]). There is variation across the country, with the largest percentage that is unique in New Brunswick. Here is a table showing the percentage of the population unique on these three variables:

Province Percentage of the Population Uniques
 Alberta  16%
 British Columbia  13%
 Manitoba  12%
 New Brunswick  49%
 Newfoundland  17%
 Nova Scotia  18%
 Ontario  9%
 PEI  10%
 Quebec  16%
 Saskatchewan  7%

An important question is whether or not these numbers are too high? Also, note as a caveat that these are estimates of uniqueness.

By most standards these numbers would be considered as high. One solution is to generalize the age into two, five, or ten year intervals, for example, but keep the FSA intact. The percentage of the population unique under these two modifications is as follows:

Province Percentage of Population Uniques with 2 Year Age Interval Percentage of Population Uniques with 5 Year Age Interval  Percentage of Population Uniques with 10 Year Age Interval
 Alberta  8%  4%  2%
 British Columbia  7%  1%  1%
 Manitoba  8%  5%  2%
 New Brunswick  41%  30%  25%
 Newfoundland  9%  5%  2%
 Nova Scotia  14%  7%  5%
 Ontario  4%  2%  1%
 PEI  3%  3%  3%
 Quebec  9%  4%  1%
 Saskatchewan  3%  3%  2%

Based on these results, we can say with some confidence that uniqueness is quite low with 10 year age intervals when age and FSA are also collected/disclosed. The exception is New Brunswick where uniqueness remains quite high even at a 10 year age interval. In instances where the custodian is comfortable (for example, because other actions are taken to manage re-identification risk) with the percentage of uniques for the 5 year age interval or even the 2 year age interval, then a custodian may disclose data on that basis. This recommendation is based on best available evidence today, and the percent uniques presented above should be seen as ceiling values on risk (i.e., they are conservative values). Using them means that you are being extra cautious.

Also note that there is considerable active research on this issue. Therefore, it is plausible that we will provide more updated and precise guidance on the disclosure of these demographics in the future.



The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.