For many studies the combination of age, gender, and residence Forward Sortation Area (FSA) are collected. Also, in many datasets that are disclosed these three variables are included. Does that represent a privacy risk?
In one of our studies we analyzed the Canadian census (2001), and one of the questions that we attempted to answer was this one. The study is available from here: http://www.jamia.org/cgi/content/abstract/16/2/256
Our conclusion was that only a small proportion of Canadians are unique on these three variables (we use uniqueness as a measure of re-identification risk; for a discussion of this issue see this KnowledgeBase post: [view here]). There is variation across the country, with the largest percentage that is unique in New Brunswick. Here is a table showing the percentage of the population unique on these three variables:
| Province |
Percentage of the Population Uniques |
| Alberta |
16% |
| British Columbia |
13% |
| Manitoba |
12% |
| New Brunswick |
49% |
| Newfoundland |
17% |
| Nova Scotia |
18% |
| Ontario |
9% |
| PEI |
10% |
| Quebec |
16% |
| Saskatchewan |
7% |
An important question is whether or not these numbers are too high? Also, note as a caveat that these are estimates of uniqueness.
By most standards these numbers would be considered as high. One solution is to generalize the age into two, five, or ten year intervals, for example, but keep the FSA intact. The percentage of the population unique under these two modifications is as follows:
| Province |
Percentage of Population Uniques with 2 Year Age Interval |
Percentage of Population Uniques with 5 Year Age Interval |
Percentage of Population Uniques with 10 Year Age Interval |
| Alberta |
8% |
4% |
2% |
| British Columbia |
7% |
1% |
1% |
| Manitoba |
8% |
5% |
2% |
| New Brunswick |
41% |
30% |
25% |
| Newfoundland |
9% |
5% |
2% |
| Nova Scotia |
14% |
7% |
5% |
| Ontario |
4% |
2% |
1% |
| PEI |
3% |
3% |
3% |
| Quebec |
9% |
4% |
1% |
| Saskatchewan |
3% |
3% |
2% |
Based on these results, we can say with some confidence that uniqueness is quite low with 10 year age intervals when age and FSA are also collected/disclosed. The exception is New Brunswick where uniqueness remains quite high even at a 10 year age interval. In instances where the custodian is comfortable (for example, because other actions are taken to manage re-identification risk) with the percentage of uniques for the 5 year age interval or even the 2 year age interval, then a custodian may disclose data on that basis. This recommendation is based on best available evidence today, and the percent uniques presented above should be seen as ceiling values on risk (i.e., they are conservative values). Using them means that you are being extra cautious.
Also note that there is considerable active research on this issue. Therefore, it is plausible that we will provide more updated and precise guidance on the disclosure of these demographics in the future.
The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.