KMP http://ehealthinformation.ca/knowledgebase/ en-us KnowlageBase RSS Generator Risky Business Newsletter - September 2011 http://ehealthinformation.ca/knowledgebase/article/AA-00200 Risky Business is the re-identification risk management newsletter that we produce with Privacy Analytics Inc.

You can download it from here: http://www.privacyanalytics.ca/riskybusiness/september-2011.pdf

Topics in September 2011 newsletter are:

  • Managing data quality in data warehouses.
  • Lessons from the privacy professor - perspectives from an experienced privacy professional.
  • Case study of de-identifying data for dislcosures for research purposes from the BORN Ontario registry.

 


The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Wed, 28 Sep 2011 07:46:06 -0400
Risky Business Newsletter - August 2011 http://ehealthinformation.ca/knowledgebase/article/AA-00198 Risky Business is the re-identification risk management newsletter that we produce with Privacy Analytics Inc.

You can download it from here: http://www.privacyanalytics.ca/riskybusiness/august-2011.pdf

Topics in August 2011 newsletter are:

  • Legislative uncertainty on de-identification provides opportunities
  • Wizards provide researchers with control of privacy issues
  • Upgrades to re-identification risk assessment and de-identification software
  • Case study: the Canadian Primary Care Sentinel Surveillance Network (CPCSSN)
  • Myth-busting whitepaper 

The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Wed, 31 Aug 2011 06:05:53 -0400
Risk Based De-identification - Methodology and Benefits http://ehealthinformation.ca/knowledgebase/article/AA-00195 Our approach to re-identification risk assessment and de-identification is risk-based. The following documents describe this general approach in more detail.

Title Place Published Download Link
Dispelling the myths surrounding de-identification Office of the Information and Privacy Commissioner of Ontario [download here]
A positive-sum paradigm in action in the health sector Office of the Information and Privacy Commissioner of Ontario  [download here]
Risk-based de-identification of health data IEEE Security and Privacy  [download here]
Methods for the de-identification of electronic health records for genomic research Genome Medicine [download here]
De-identification: Reduce privacy risks when sharing personally identifiable information Privacy Analytics Whitepaper  [download here]


The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Sat, 30 Jul 2011 07:50:01 -0400
Real world de-identification examples http://ehealthinformation.ca/knowledgebase/article/AA-00194 These articles describe in detail how technologies incorporated in PARAT have been used to de-identify actual data sets. They demonstrate the risk analysis that is performed and show how de-identification methods can be used in practice.

Publication Type of Data Download Link
 Canadian Journal of Hospital Pharmacy  Prescription and diagnosis data [download here]
 BMC Medical Informatics and Decision Making  Hospital discharge abstract data [download here]


The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Sat, 30 Jul 2011 07:29:31 -0400
Risky Business Newsletter - July 2011 http://ehealthinformation.ca/knowledgebase/article/AA-00191 Risky Business is the re-identification risk management newsletter that we produce with Privacy Analytics Inc.

You can download it from here: http://www.privacyanalytics.ca/riskybusiness/july-2011.pdf

Topics in this month's  newsletter are:

  • the new release of the PARAT de-identification tool,
  • the use of our de-identification tool when disclosing cancer data by the cd-link project in Ontario,
  • the IRB and REB Wizards available on-line, and
  • our work on the $3m Heritage Health Prize.
 

The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Sat, 30 Jul 2011 06:51:17 -0400
How generalizable are the conclusions from this study ? http://ehealthinformation.ca/knowledgebase/article/AA-00187 We have been asked about the generalizability of the clinical trials file sharing study here: http://www.jmir.org/2011/1/e18/

We must remember that this was a small study, although it did highlight some important issues. Below we will consider a number of points about how much we can generalize the findings:

  • The analysis of passwords was performed in 2007-2008 (the interviews were conducted in 2010). It is plausible that because of education and other improvements in practices, individuals working on clinical trials do not send files containing personal health information by email any more, or are now using more sophisticated encryption methods. Actually, this issue has been of great concern to us because we wanted to ensure that we represent reality as accurately as possible and do not unfairly project historical practices to today. However, anecdotally, as we discussed and presented these results over the last few months, the reaction we kept getting is of agreement that this is consistent with current practices. In particular, we often enough hear that data is transmitted without encryption, let alone weak passwords. Therefore, we are reluctant to say that these results are old and do not reflect what is happening today.
  • Because of the nature of this kind of analysis (revealing weak practices), we believe that only those individuals who thought they had decent practices in place agreed to participate and subject themselves to this kind of scrutiny. Therefore, it can be argued that we are describing the "good" end of the spectrum where passwords are being used at all.
  • Modern EDC systems will likely make emailing patient files unnecessary. However, even when EDC systems are in use, not all individuals who need data have accounts on the EDC systems. Therefore, just because an EDC system is in use, that does not mean that file sharing is done in a secure way. And of course, not all trials are using EDC systems.
  • We did not try to recruit from a random sample of all trials. Therefore, our sampling frame may have been biased towards trials that do not have good security practices.

We also do acknowledge that there is a lot of variability and there will be trials with very strong security practices.


The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


 

]]>
Sat, 22 Jan 2011 19:19:38 -0500
Why didn't we use a more powerful password attack tool ? http://ehealthinformation.ca/knowledgebase/article/AA-00183 This question came up in the context of our study on clinical trials file sharin described here: http://www.jmir.org/2011/1/e18/

There are more "professional" grade password recovery tools that we could have used, such as jack-the-ripper. We chose the tools that we did for three reasons:

  • We wanted to find out the extent to which an unsophisticated adversary could recover the passwords. The tools we selected are quite easy to use and do not require an extensive background in security to execute.
  • The jack-the-ripper tool, as far as we could determine, would not work on the versions of ZIP files and Office files that we were targeting, and we were unable to find any modifications that would allow us to work with such files. It is often used to attack lists of hashed passwords.
  • In any case, we did use jack-the-ripper with the passwords that we were able to recover (we created a hashed list) and jtr was able to recover all of them quite quickly. Therefore, our results are not specific to a particular tool since the passwords were just not strong ones.

We also used auxillary information to recover the passwords, such as the trial names (many files had the trial name in the file name), and the site names. This helped in some cases.

 


The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


 

]]>
Sat, 22 Jan 2011 13:24:51 -0500
How come the 'hit rate' for the password recovery is so high ? http://ehealthinformation.ca/knowledgebase/article/AA-00182 In password cracking studies usually the success rate is lower than what we have reported here: http://www.jmir.org/2011/1/e18/.

The reasons we think our hit rate was high are as follows:

  • The passwords that we recovered were not unique for each file. Some of the passwords were reused in multiple files within the same trial. If the denominator is reduced then the percentages will not be as high.
  • Sometimes the same individuals or collaborating individuals were involved in more than one trial, therefore the generation of passwords was not from independent individuals. This means that the structure and strength of the passwords were correlated.
  • The passwords were really not that strong. The general templates for all of the passwords where (let {x,y} mean repeat at a minimum x to a maximum y times):
    • <dictionary-word>
    • <digit>{1,4}<dictionary-word>
    • <dictionary-word>{1,4}
    • <dictionary-word><dictionary-word>{1,4}
    • {1,4}<dictionary-word><dictionary-word>


These are obvious templates to try. They reflect an attempt at using characters and digits, but have a predictable structure.

  • Even though we were not told the specific trial names it was relatively easy to determine these from the file names. For example, if a trial was called "FOO" (acronym), then the file would be something like "FOO_LABS.ZIP". Therefore we added the trial names to the word dictionary. Once we knew the trial name, we were also able to determine the sponsor and the site names, which we also added to the dictionary. These dictionary words were helpful in recovering the passwords as well. We did not include that dictionary with the paper because it would reveal the identity of the trials. However, having that auxiliary information, which is possible in this particular context, makes it easier to create relevant word dictionaries.


The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


 

]]>
Sat, 22 Jan 2011 13:15:08 -0500
Why are there few trials and files ? http://ehealthinformation.ca/knowledgebase/article/AA-00181 One question that has been brought up about our clinical trials file sharing study (see here: http://www.jmir.org/2011/1/e18/) was why there were few trials and files in the password recovery portion of the study. One of the challenges in doing this kind of research is finding organizations or individuals who are willing to take the risk that the findings will not make them look good. In recruiting trials for this study we ran into a number of challenges because:

  • Stakeholders were reluctant to participate because they did not want to expose their practices to scrutiny. This suggests that those who agreed believed that their practices were good, and would therefore represent the better end of the spectrum, making our results conservative. 
  • When we found a willing stakeholder for a trial, we did get back responses like "I am not able to find files for you because most of the data files that were being emailed are not encrypted" (here I am paraphrasing). Therefore, because we were trying to find password protected / encrypted files that were sent by email, we had a challenge because few files were actually encrypted. This was disturbing.

Therefore, while we do not claim that this study is representative of all trials, they are likely painting a more positive picture than what realty is like (see the discussion on generlizability here).




The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


 

]]>
Sat, 22 Jan 2011 13:06:22 -0500
Deciding on a Threshold - Performing a Risk Assessment http://ehealthinformation.ca/knowledgebase/article/AA-00180 One of the challenges in de-identifying data sets is deciding what the appropriate threshold should be. The threshold represents the maximum risk that the data custodian is willing to take. The PARAT tool has a powerful risk assessment expert system that enables the user to choose a threshold.

The expert systems consists of a series of checklists that cover the practices of the data recipient, as well as the characteristics of the data that are being disclosed. It provides a defensible and rationale way to choose a threshold.





The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


]]>
Sat, 15 Jan 2011 00:00:00 -0500