Browse Ask a Question
Tools Add
Rss Categories

What de-identification software tools are there ?

Author: Khaled El Emam Views: 1357 Created: 17-10-2009 18:00 Last Updated: 25-04-2010 12:05

There are four de-identification tools that are generally available. Beyond those four, the tools that exist are internal to organizations and therefore not generally available, or have been developed for personal use (by researchers) and therefore have not been applied broadly.

The four generally available de-identification tools are:

  • mu-Argus, developed by the Netherlands national statistical agency. More information about mu-Argus can be found here:
    http://neon.vb.cbs.nl/casc/Software/MuManual4.2.pdf

    and the tool itself can be downloaded from here: http://neon.vb.cbs.nl/casc/Software/MU420_B1.zip

  • The Cornell Anonymization Toolkit (CAT) implements a k-anonymity algorithm. It is an open source tool available here: http://sourceforge.net/projects/anony-toolkit/
    with documentation available here: http://www.cs.cornell.edu/bigreddata/publications/2009/sigmod2009-p1051-xiao.pdf

  • The University of Texas at Dallas Anonymization Toolbox, which contains open source Java implementations of some k-anonymity and attribute disclosure control algorithms, with documentation: http://cs.utdallas.edu/dspl/cgi-bin/toolbox/index.php


The only tool that is commercially available and actively supported is PARAT from Privacy Analytics. Another useful point of comparison is that the algorithm implemented in PARAT has been shown in a recent article to perform better than the algorithm implemented in CAT (see http://www.jamia.org/cgi/content/short/16/5/670). Furthermore, the risk estimator used in PARAT has been shown to produce more accurate de-identification results than the one incorporated in mu-Argus (see http://www.jamia.org/cgi/content/abstract/15/5/627).

The UTD toolbox includes some of the same algorithms as CAT. This toolbox contains a set of capabilities rather than a tool that is ready to use by an end-user (e.g., an analyst), and therefore is targeted more at developers.

We also spent some time evaluating the CAT tool. There a significant number of usability issues with it (for example, we were unable to find the place to define the value of k for the k-anonymity algorithm, it was not possible to view data by equivalence class, and the data views gave the same record id every 60 records), and an inability to import standard data files. The lack of documentation and support made using the tool difficult. While this may have been good to complete a Masters thesis project, it clearly lacked important functionality for broader use.

Note that de-identification tools are different from masking tools. The attached document provides an overview of de-identification techniques and explains at some length the differences between these two approaches and when each is more suitable.


The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.


Attachments
Report on de-identification techniques 0.5 Mb Download File
Rss Comments
  • There are no comments for this article.
Info Add Comment
Nickname: Email (will not be shown): Subject: Question:
Info Ask a Question