• Corpus ID: 199448369

Window Classifiers and Conditional Random Fields for Medical Report De-Identification

  title={Window Classifiers and Conditional Random Fields for Medical Report De-Identification},
  author={Viviana Cotik and Franco M. Luque and Juan Manuel P{\'e}rez},
Information extraction of medical reports is key in order to improve timely discoveries of findings and as an aid to improve decisions about medical treatments and budget. In order to develop information extraction methods, medical data has to be available. Since this data is extremely sensitive due to the presence of personal information, report de-identification is needed. We present two methods, a window classifier and an implementation of conditional random fields (CRF) in order to de… 

Figures and Tables from this paper


De-identification of medical records using conditional random fields and long short-term memory networks
A de-identifier for medical discharge summaries
Automatic De-identification of Medical Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods and Evaluation of Results
This paper summarizes the settings, data and results of the first shared track on anonymization of medical documents in Spanish, the MEDDOCAN (Medical Document Anonymization) track, which relied on a carefully constructed synthetic corpus of clinical case documents following annotation guidelines for sensitive data based on the analysis of the EU General Data Protection Regulation.
De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields
This work presents work on the creation of two refined variants of a manually annotated Gold standard for de-identification, one created automatically, and one created through discussions among the annotators, both of which are based on the Stockholm EPR Corpus.
Annotation of Entities and Relations in Spanish Radiology Reports
A manual annotation of radiology reports written in Spanish is performed and the corpus, the annotation schema, the annotated guidelines and further insight of the data are presented.
State-of-the-art anonymization of medical records using an iterative machine learning framework.
A de-identification model that can successfully remove personal health information (PHI) from discharge records to make them conform to the guidelines of the Health Information Portability and Accountability Act is developed.
Research Paper: Rapidly Retargetable Approaches to De-identification in Medical Records
This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation, and developed a method for tuning the balance of recall vs. precision in the Carafe system.
Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified.