Artificial Intelligence in Medicine
Volume 48, Issue 1 , Pages 29-41, January 2010

Secure construction of k-unlinkable patient records from distributed providers

  • Bradley Malin

      Affiliations

    • Corresponding Author InformationTel.: +1 615 343 9096; fax: +1 615 322 0502.

Department of Biomedical Informatics, School of Medicine, 2525 West End Avenue, Suite 600, Vanderbilt University, Nashville, TN 37203, USA

Received 25 November 2008; received in revised form 8 June 2009; accepted 12 September 2009.

Abstract 

Objectives

Healthcare organizations must adopt measures to uphold their patients’ right to anonymity when sharing sensitive records, such as DNA sequences, to publicly accessible databanks. This is often achieved by suppressing patient identifiable information; however, such a practice is insufficient because the same organizations may disclose identified patient information, devoid of the sensitive information, for other purposes and patients’ organization-visit patterns, or trails, can re-identify records to the identities from which they were derived. There exist various algorithms that healthcare organizations can apply to ascertain when a patient's record is susceptible to trail re-identification, but they require organizations to exchange information regarding the identities of their patients prior to data protection certification. In this paper, we introduce an algorithmic approach to formally thwart trail re-identification in a secure setting.

Methods and materials

We present a framework that allows data holders to securely collaborate through a third party. In doing so, healthcare organizations keep all sensitive information in an encrypted state until the third party certifies that the data to be disclosed satisfies a formal data protection model. The model adopted for this work is an extended form of k-unlinkability, a protection model that, until this work, was applied in a non-secure setting only. Given the framework and protection model, we develop an algorithm to generate data that satisfies the protection model. In doing so, we enable healthcare organizations to prevent trail re-identification without revealing identified information.

Results

Theoretically, we prove that the proposed data protection model does not leak information, even in the context of an organization's prior knowledge. Empirically, we use real world hospital discharge records to demonstrate that, while the secure protocol induces additional suppression of patient information in comparison to an existing non-secure approach, the quantity of data disclosed by the secure protocol remains substantial. For instance, in a population of over 7700 sickle cell anemia patients, the non-secure protocol discloses 99.48% of DNA records whereas the secure protocol permits the disclosure of 99.41%.

Conclusions

Our results demonstrate healthcare organizations can collaborate to disclose significant quantities of personal biomedical data without violating their anonymity in the process.

Keywords: Privacy, Confidentiality, Electronic medical records, Distributed databases, Anonymization algorithms

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0933-3657(09)00135-3

doi:10.1016/j.artmed.2009.09.002

Artificial Intelligence in Medicine
Volume 48, Issue 1 , Pages 29-41, January 2010