State-of-the-art anonymization of medical records using an iterative machine learning framework.
@article{Szarvas2007StateoftheartAO, title={State-of-the-art anonymization of medical records using an iterative machine learning framework.}, author={Gy{\"o}rgy Szarvas and Rich{\'a}rd Farkas and R{\'o}bert Busa-Fekete}, journal={Journal of the American Medical Informatics Association}, year={2007}, volume={14}, pages={574-580} }
Objective: The anonymization of medical records is of great importance in the human life sciences because a de-identified text can be made publicly available for non-hospital researchers as well, to facilitate research on human diseases. Here the authors have developed a de-identification model that can successfully remove personal health information (PHI) from discharge records to make them conform to the guidelines of the Health Information Portability and Accountability Act. Design: We…
126 Citations
Anonymization of Sensitive Information in Medical Health Records
- Computer ScienceIberLEF@SEPLN
- 2019
This paper has tried to identify PHI on medical records written in Spanish language by building a neural network involving an LSTM-CRF model and applying two approaches for the anonymization of medical records.
Anonymization Framework for Securing Protected Health Information in a Complex Dataset of Medical Narratives
- Computer ScienceMehran University Research Journal of Engineering and Technology
- 2020
This work presents a rule-based Natural Language Processing (NLP) anonymization system using a challenging corpus containing medical narratives and ICD-10 codes (medical codes) to identify, classify and anonymize Protected Health Information (PHI) with PHI categories.
Patient Data De-Identification: A Conditional Random-Field-Based Supervised Approach
- Computer Science
- 2017
Insight is provided into the de-identification task, its major challenges, techniques to address challenges, detailed analysis of the results and direction of future improvement, and a supervised machine learning technique for solving the problem of patient data deidentification.
DE-IDENTIFICATION OF PROTECTED HEALTH INFORMATION PHI FROM FREE TEXT IN MEDICAL RECORDS
- Computer ScienceInternational Journal of Security, Privacy and Trust Management
- 2019
This work improved the applicability of the NeuroNER system to Indian data and improved its efficiency and reliability.
The Role of Inference in the Anonymization of Medical Records
- Computer Science2014 IEEE 27th International Symposium on Computer-Based Medical Systems
- 2014
It is shown how sensitive attributes can be exploited to derive information about the QIs, leading to many privacy hazards for the patients whose records are shared.
Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
- Computer ScienceJ. Am. Medical Informatics Assoc.
- 2007
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified.
A Hybrid Semi-supervised Learning Approach to Identifying Protected Health Information in Electronic Medical Records
- Computer ScienceIMCOM
- 2016
This paper proposes a hybrid semi-supervised learning approach to identifying protected health information (PHI) in electronic medical records that combines a machine learning-based method with a conditional random fields model and a rule- based method in a post-processing phase to handle 8 PHI types with disambiguity.
Automatic de-identification of medical records with a multilevel hybrid semi-supervised learning approach
- Computer Science2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF)
- 2016
This paper proposes an automatic de-identification solution in a multilevel hybrid semi-supervised learning paradigm with a key focus on correctly identifying protected health information (PHI) in the EMRs by combining a machine learning- based method with a conditional random fields model and a rule-based method in a post-processing phase to handle the PHI types with disambiguity.
De-identifying an EHR Database - Anonymity, Correctness and Readability of the Medical Record
- Computer Science, MedicineMIE
- 2011
A de-identification algorithm is developed that uses lists of named entities, simple language analysis, and special rules to generate a Danish EHR database with real medical records, but related to artificial persons.
Automatic de-identification of electronic medical records using token-level and character-level conditional random fields
- Computer ScienceJ. Biomed. Informatics
- 2015
References
SHOWING 1-10 OF 21 REFERENCES
Identification of patient name references within medical documents using semantic selectional restrictions
- Computer ScienceAMIA
- 2002
The proposed algorithm is based on estimating the fitness of candidate patient name references to a set of semantic selectional restrictions that place tight contextual requirements upon candidate words in the report text and are determined automatically from a manually tagged corpus of training reports.
Computer-assisted de-identification of free text in the MIMIC II database
- Computer ScienceComputers in Cardiology, 2004
- 2004
An evaluation of methods for computer-assisted removal and replacement of protected health information (PHI) from free-text nursing notes collected in the intensive care unit as part of the MIMIC II project is presented.
Research Paper: Rapidly Retargetable Approaches to De-identification in Medical Records
- Computer ScienceJ. Am. Medical Informatics Assoc.
- 2007
This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation, and developed a method for tuning the balance of recall vs. precision in the Carafe system.
Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text
- Computer ScienceNAACL
- 2006
It is shown that one can deidentify medical discharge summaries using support vector machines that rely on a statistical representation of local context, which contributes more to deidentification than dictionaries and hand-tailed heuristics.
Medical document anonymization with a semantic lexicon
- Computer ScienceAMIA
- 2000
An original system for locating and removing personally-identifying information in patient records, using natural language processing tools provided by the MEDTAG framework: a semantic lexicon specialized in medicine, and a toolkit for word-sense and morpho-syntactic tagging.
Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research.
- MedicineAmerican journal of clinical pathology
- 2004
By the end of the evaluation, the system was reliably and specifically removing safe-harbor identifiers and producing highly readable deidentified text without removing important clinical information.
A successful technique for removing names in pathology reports using an augmented search and replace method
- MedicineAMIA
- 2002
A tool based on the fact that the vast majority of proper names in pathology reports occur in pairs that was easy to implement and was largely based on publicly available data sources to achieve accuracy similar to previous attempts at de-identification.
Identifying Personal Health Information Using Support Vector Machines
- Computer Science
- 2006
This work explores the use of Support Vector Machines to recognize personal health information in medical discharge summaries by using an information extraction system designed for newswire text, plus a set of rules that incorporate entityspecific knowledge.
Replacing personally-identifying information in medical records, the Scrub system.
- Computer ScienceProceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium
- 1996
We define a new approach to locating and replacing personally-identifying information in medical records that extends beyond straight search-and-replace procedures, and we provide techniques for…
Automatic Deidentification by using Sentence Features and Label Consistency
- Computer Science
- 2006
The present paper proposes a new approach employing three types of non-local features, which does not come from surrounding words: sentence features, corresponding to the previous/next sentence information and label consistency, preferring the same label for the same word sequence.