KMP http://www.ehealthinformation.ca/knowledgebase/category/14 en-us KnowlageBase RSS Generator Deciding on a Threshold - Performing a Risk Assessment http://www.ehealthinformation.ca/knowledgebase/article/AA-00180 One of the challenges in de-identifying data sets is deciding what the appropriate threshold should be. The threshold represents the maximum risk that the data custodian is willing to take. The PARAT tool has a powerful risk assessment expert system that enables the user to choose a threshold.

The expert systems consists of a series of checklists that cover the practices of the data recipient, as well as the characteristics of the data that are being disclosed. It provides a defensible and rationale way to choose a threshold.





The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Sat, 15 Jan 2011 00:00:00 -0500
Residence Trails - Risk assessment for longitudinal data http://www.ehealthinformation.ca/knowledgebase/article/AA-00174 The Residence Trails expert system is another tool that can be used for performing a re-identification risk assessment before any data is actually collected. In a sense it is similar to the REB Wizard expert system and can also be used by research ethics boards for evaluating the identifiability of data collection from descriptions in protocols. The primary differences between these two expert systems are:

  • Residence Trails focuses on three very specific demographics when evaluating risk: date of birth, postal code, and gender. These are the most common demographics in most situations in any case. However, the REB Wizard allows you to specify any quasi-identifiers. The two expert systems are based on different underlying models and that's why they have a different focus.

  • Residence Trails gives more accurate re-identification risk estimates for populations living in urban areas. The REB Wizard provides estimates for rural areas as well (actually, it does not differentiate between rural and urban areas).

  • The Residence Trails expert system allows the user to look at the risk from collecting longitudinal location (residence) information. These residence trails (hence the name) reveal the pattern of movement of individuals over time. The expert systems allows for the analysis of longitudinal data for up to 11 years.


The expert system allows the user to select which variables to consider and at what level of granularity, and it will provide a re-identification risk in terms of the percentage of the population thta is unique. Once an analysis is complete the user can export the models into Word, PDF, or PowerPoint files.



Related Articles




The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Sat, 16 Oct 2010 00:00:00 -0400
Research Ethics Board Wizard - Re-identification risk assessment without data http://www.ehealthinformation.ca/knowledgebase/article/AA-00172 Research Ethics Boards (REBs) often have to make decisions about re-identification risk before any data is collected. For many REBs the majority of their protocols are not "secondary use" protocols whereby a database exists and the investigator wishes to analyze that data. Rather, many are prospective studies where new data will be collected. Traditional re-identification risk assessment tools and de-identification tools could not handle that situation because they required the data to already exist - until now.

The REB Wizard tool that is illustrated in this video provides REBs the capability to assess re-identification risk by just describing the fields that will be collected and which part of the country (REB Wizard only exists for Canada at this point) the data will be collected from. Based on extensive analysis of the Canadian census, we have constructed models that would then provide an estimate of the percentage of the population that is at high risk of re-identification. In this case re-identification risk is measured in terms of uniqueness.





The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Tue, 05 Oct 2010 00:00:00 -0400
Generating data sharing agreements automatically http://www.ehealthinformation.ca/knowledgebase/article/AA-00165 One of the key powerful features in PARAT is the ability to produce data sharing agreements automatically based on the results of the risk analysis and de-identification. There are two types of data sharing agreements that can be produced: for an agent (sub-contractor) and for a researcher. The following video shows how to do that.




The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Sat, 06 Mar 2010 00:00:00 -0500
Specifying and de-identifying correlated variables in PARAT http://www.ehealthinformation.ca/knowledgebase/article/AA-00164 In many data sets there will be correlated variables. This means that the value of one variable can be predicted from another variable. Some examples include:

  • Date of birth of a baby and date of discharge from a hospital.
  • Date of death and date of an autopsy.
  • Weight at birth and weight of baby at discharge from a hospital.
  • Age and date of graduation.


In the context of de-identification correlated variables must be dealt with explicitly. For example, if the correlated variables are date of birth and date of discharge from hospital, then if we de-identify one to, say, a month and year and leave the other one as the full date, then the de-identification was meaningless. The full date of birth can be predicted from the full date of discharge even if the date of birth is generalized to month/year or just year of birth.

In PARAT it is possible to specify such relationships and the tool will automatically ensure that the generalizations are the same. The video below illustrates how to do that.

One thing to note that in PARAT only variables of the same type can be correlated and they must also have  the same depth in their generalization hierarchy.





The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Sat, 06 Mar 2010 00:00:00 -0500
Producing summary reports and certificates after risk assessment and de-identification http://www.ehealthinformation.ca/knowledgebase/article/AA-00166 After completing a re-identification risk assessment, it often useful to have a summary report that can be included in other documentation and to provide the evidential trail to demonstrate that the risk of re-identification is indeed low. PARAT provides a risk assessment Word report and a missingness Word report, as illustrated in this video.

Some users have utilized the summary reports as "certificates" to research ethics boards and privacy officers. These Word reports can be customized as described here: http://www.ehealthinformation.ca/knowledgebase/article/AA-00108/7/PARAT-Tool/How-can-I-change-the-layout-of-the-risk-assessment-Word-report-.html





The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Fri, 05 Mar 2010 00:00:00 -0500
Using weights to signify variable importance in PARAT http://www.ehealthinformation.ca/knowledgebase/article/AA-00163 One of the powerful capabilities in PARAT is that it allows the end-user to specify weights for each one of the quasi-identifiers. A weight reflects how important a particular quasi-identifier is for subsequent analysis. If a quasi-identifier has a high weight then it means that the variable is important and PARAT will try to minimize the amount of distortion (generalization and suppression) to that variable during the de-identification process. This video shows you how to set weights and gives an example of the impact of changing weights.

Quasi-identifier weights is one of the parameters that you would change when deciding which de-identification solution to use.





The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Thu, 04 Mar 2010 00:00:00 -0500
De-identifying a registry http://www.ehealthinformation.ca/knowledgebase/article/AA-00161 This video describes how to use the PARAT tool to de-identify a registry data set. This assumes that the data is cross-sectional (i.e., not a longitudinal data set). The video illustrates the basic functionality of the PARAT tool: selecting a data set, doing a risk assessment, and automated de-identification.





The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Thu, 04 Mar 2010 00:00:00 -0500
Using top and bottom coding in PARAT http://www.ehealthinformation.ca/knowledgebase/article/AA-00160 This video illustrates functionality in the PARAT tool to top and bottom code numeric variables. Top and bottom coding is often a powerful way to reduce the risk of re-identification and still maintain significant data quality. Another posting provides more information about how it works: http://www.ehealthinformation.ca/knowledgebase/article/AA-00144/7/PARAT-Tool/Generalizing-numbers-in-PARAT.html




The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Wed, 03 Mar 2010 00:00:00 -0500