The Importance of Context: Risk-based De-identification of Biomedical Data.
@article{Prasser2016TheIO,
title={The Importance of Context: Risk-based De-identification of Biomedical Data.},
author={Fabian Prasser and Florian Kohlmayer and Klaus A. Kuhn},
journal={Methods of information in medicine},
year={2016},
volume={55 4},
pages={
347-55
}
}BACKGROUND
Data sharing is a central aspect of modern biomedical research. [] Key Method We performed an extensive experimental evaluation to analyze the impact of using different risk models and assumptions about the goals and the background knowledge of an attacker on the quality of de-identified data.
RESULTS
The results of our experiments show that data quality can be improved significantly by using risk models for data de-identification. On a scale where 100 % represents the original input dataset and…
Figures, Tables, and Topics from this paper
23 Citations
A Scalable and Pragmatic Method for the Safe Sharing of High-Quality Health Data
- Computer ScienceIEEE Journal of Biomedical and Health Informatics
- 2018
The results of an extensive experimental evaluation show that the approach enables the safe sharing of high-quality data and that it is highly scalable.
Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review
- MedicineJournal of medical Internet research
- 2019
The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject.
A Tool for Optimizing De-identified Health Data for Use in Statistical Classification
- Computer Science2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS)
- 2017
The results show that the approach enables the creation of privacy-preserving classifiers with optimal prediction accuracy, and is used to create logistic regression models from a patient discharge dataset for predicting the costs of hospital stays.
Privacy-enhancing ETL-processes for biomedical data
- Computer ScienceInt. J. Medical Informatics
- 2019
Secondary Use of Clinical Data in Data-Gathering, Non-Interventional Research or Learning Activities: Definition, Types, and a Framework for Risk Assessment
- MedicineJournal of medical Internet research
- 2021
In the future, research ethics committees and data use and access committees will be able to rely on and apply the framework offered here when reviewing projects of secondary use of clinical data for learning and research purposes.
A scalable software solution for anonymizing high-dimensional biomedical data
- Computer ScienceGigaScience
- 2021
This article extends the open source software ARX to improve its support for high-dimensional, biomedical datasets and implements 2 novel search algorithms, one of which is a greedy top-down approach and the other based on a genetic algorithm.
Flexible data anonymization using ARX—Current status and challenges ahead
- Computer ScienceSoftw. Pract. Exp.
- 2020
This work describes how an open source data anonymization tool is extended to support almost arbitrary combinations of a wide range of techniques in a scalable manner and results of an extensive experimental comparison show that this approach outperforms related solutions in terms of scalability and output data quality.
Survey on Privacy-Preserving Techniques for Data Publishing
- Computer ScienceArXiv
- 2022
The main challenges raised by privacy constraints are discussed, the main approaches to handle these obstacles are described, taxonomies of privacy-preserving techniques are reviewed, theoretical analysis of existing comparative studies are provided, and multiple open issues are raised.
On statistical disclosure control technologies : for enabling personal data protection in open data settings
- Computer Science
- 2018
The main objective of the study is to provide in-sights in the main functionalities provided by SDC technologies so that data control-lers can be supported in their decision-making processes related to storing, sharing, and opening the ministry’s personal data.
Challenges and Open Problems of Legal Document Anonymization
- Computer ScienceSymmetry
- 2021
This paper aims to summarize and highlight the open and symmetrical problems from the fields of structured and unstructured text anonymization and the possible methods for anonymizing legal documents discussed and illustrated by case studies from the Hungarian legal practice.
References
SHOWING 1-10 OF 46 REFERENCES
Estimating the re-identification risk of clinical data sets
- Mathematics, PsychologyBMC Medical Informatics and Decision Making
- 2012
This study identified an accurate decision rule that can be used by health privacy researchers and disclosure control professionals to estimate uniqueness in clinical data sets and provides a reliable way to measure re-identification risk.
A Systematic Review of Re-Identification Attacks on Health Data
- MedicinePloS one
- 2011
The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards, and evidence is insufficient to draw conclusions about the efficacy of de-Identification methods.
ARX - A Comprehensive Tool for Anonymizing Biomedical Data
- Computer ScienceAMIA
- 2014
ARX is presented, an anonymization tool that implements a wide variety of privacy methods in a highly efficient manner, provides an intuitive cross-platform graphical interface, and offers a programming interface for integration into other software systems.
Putting Statistical Disclosure Control into Practice: The ARX Data Anonymization Tool
- Computer ScienceMedical Data Privacy Handbook
- 2015
ARX is an anonymization tool for structured data which supports a broad spectrum of methods for statistical disclosure control by providing models for analyzing re-identification risks, and syntactic privacy criteria, such as k-anonymity, l-diversity, t-closeness and δ-presence.
Research Paper: A Globally Optimal k-Anonymity Method for the De-Identification of Health Data
- Computer Science, MedicineJ. Am. Medical Informatics Assoc.
- 2009
For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
Technical and Policy Approaches to Balancing Patient Privacy and Data Sharing in Clinical and Translational Research
- Computer Science, MedicineJournal of Investigative Medicine
- 2010
This work recounts a recent privacy-related concern associated with the publication of aggregate statistics from pooled genome-wide association studies that have had a significant impact on the data sharing policies of National Institutes of Health-sponsored databanks.
Anonymizing Health Data: Case Studies and Methods to Get You Started
- Computer Science
- 2013
This practical book demonstrates techniques for handling different data types, based on the authors experiences with a maternal-child registry, inpatient discharge abstracts, health insurance claims, electronic medical record databases, and the World Trade Center disaster registry, among others.
R-U policy frontiers for health data de-identification
- MedicineJ. Am. Medical Informatics Assoc.
- 2015
R-U frontiers of de-identification policies can be discovered efficiently, allowing healthcare organizations to tailor protections to anticipated needs and trustworthiness of recipients.
Research Paper: Protecting Privacy Using k-Anonymity
- Computer ScienceJ. Am. Medical Informatics Assoc.
- 2008
It is found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.
Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data
- Political Science, MedicineJ. Am. Medical Informatics Assoc.
- 2013
Policy proposals intended to address de-identification concerns while maintaining de-Identification as an effective tool for protecting privacy and preserving the ability to leverage health data for secondary purposes are discussed.








