Browse
 
Tools
Rss Categories

How should de-identification be incorporated into a research ethics review process?

Views: 1656 Created: 25-11-2009 19:00 Last Updated: 23-09-2011 15:23

In this article we will present a recommended way to integrate de-identification into an REB review process. This is based on our experiences after we have tried many alternative processes, and found that this approach works well in practice. The context for this article is a research protocol that involves the secondary use of some existing data. This process can be adapted for protocols with interventional or prospective research, but the example used here is secondary use.

A basic work flow diagram illustrating this process is shown below.




The key players are the researcher herself, a scientific review committee, a data access committee (DAC), the REB, and a database administrator. The scientific review committee may be a committee formed by a funding agency, or peers at a particular institution. We do not really care how the scientific review is done, but it is included here to illustrate a few key points. The DAC consists of de-identification and privacy experts who would perform a re-identification risk assessment on the protocol. The rationale for such a committee and the recommended interactions between these experts, the researcher, and the REB are explained in more detail here

The DAC (or rather, members of the DAC) needs to have access to tools that can perform re-identification risk assessment. These tools would also need to be able to anaylyze the original data being requested in order to perform that risk assessment. The database administrator is responsible for the data with PHI and has an appropriate de-identification tool in place to de-identify the data.

The researcher submits the protocol to the scientific review committee and the DAC at the same time. The reason is that in theory there may be some iteration between the scientific review process and the DAC process.

The DAC would perform a re-identification risk assessment and decide how to adequately de-identify the data set requested by the researcher. This activity will require some access to the data in order to perform the risk assessment. The re-identification risk assessment process may result in changing the precision of the data that is being requested. For example, the original protocol may request admission and discharge dates, but the risk assessment may recommend replacing that with length of stay in days. Such changes in data may require changes in the protocol as well. If the protocol changes then the scientific review may have to be revisited. Also, during the scientific review methodological or theoretical issues may be raised, which may affect the requested data elements. If the requested data elements change, then the re-identification risk assessment may have to be revisited. Therefore, at least conceptually, there is potentially some interaction and possibly iteration between the scientific review process and the re-identification risk assessment process performed by the DAC.

In practice the interaction between scientific review and DAC review is often not possible because of the way peer review is often structured (e.g., with the research funding agencies). Therefore, since there is likely not to be any interaction or iteration between scientific review and data access review, we can save time by doing these activities in parallel, or sequence them and hope for the best!

If either the scientific review committee or the DAC do not approve the protocol, then it would go back to the researcher for a revision. If the scientific review committee approves the protocol, then they provide some kind of approval documentation, such as a letter.

The researcher provides the DAC with the protocol as well as a variable checklist. This checklist is quite important in that it clarifies the exact fields that are requested. It also highlights to the researcher which fields in the requested database are quasi-identifiers and may therefore undergo some kind of generalization and suppression. The checklist allows the researcher to indicate the level of data granularity that they will accept. For example, a researcher may be willing to get the year of birth instead of the full date of birth. If this is explicitly specified up-front in the checklist then it would potentially reduce significantly the number of iterations between the researcher and the data access committee. The checklist would also contain information about the weight of the quasi-identifiers to indicate their importance. For example, a weight of 1 would mean that a particular variable is particularly important and it should be minimally impacted by the de-identification. Alternatively, a low weight indicates that a quasi-identifier, relatively speaking, is less important to the eventual analysis. The more trade-offs that the researcher performs up-front, the quicker the re-identification risk analysis.

An example of such a checklist (or part of it) is attached to this article. This is an example from the Ontario birth registry.

The DAC determines how to appropriately de-identify the data given the risks, and negotiates that with the researcher. This negotiation may take up a number of iterations. Again, it should be recalled from the discussion here that these iterations would be rapid because a single individual from the DAC is assigned to negotiate with the researcher. The objective is not to create another layer of bureaucracy, but to have a negotiation and provide data to facilitate making trade-offs. The output from this process would consist of two things:

Risk Assessment Results. These would consist of a report indicating the de-identification that will be applied as well as the residual risk in the data set that will be disclosed. An example of such a report is provided here.

Data Sharing Agreement. Because the amount of de-identification would normally be contingent on the security and privacy practices that the researcher has in place, the researcher must commit to implementing these practices in a data sharing agreement. Such an agreement would not always be needed. For example, if a researcher is an employee of a hospital and the data comes from the hospital, then the researcher would be bound by her employment contract which should cover the handling of sensitive patient data. However, if the researcher is external to the hospital or at a university, then a data sharing agreement would most certainly be recommended. Note that a different data sharing agreement would be needed for every project because the specific terms may vary depending on the data (sub-) set required.

Once the REB receives these two items, it will have sufficient evidence that the residual risk of re-identification is acceptably low and will have the terms of the data sharing agreement that the researcher will be signing for this particular data release. Many Canadian REBs will waive the requirement to obtain patient consent if they are convinced that the requested data set is de-identified. And now the REB can perform a regular ethics review knowing that the privacy issues have been addressed.

If the REB approves the protocol, then this information is conveyed to the database administrator who would then create a data set according to the risk assessment report. The database administrator would then provide the data to the researcher in some secure format.

If the REB does not approve the protocol for reasons not related to re-identification risk, then the researcher would have to resubmit the protocol at some later point. If the protocol is not approved because of an issue related to re-identification risk, then the researcher would have to go through the process again with the DAC to perform another risk assessment.



The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.

 

Attachments
Example of a Variable Checklist 94 Kb Download File