De-identification of medical records using conditional random fields and long short-term memory networks
@article{Jiang2017DeidentificationOM, title={De-identification of medical records using conditional random fields and long short-term memory networks}, author={Zhipeng Jiang and Chao Zhao and Bin He and Yi Guan and Jingchi Jiang}, journal={Journal of biomedical informatics}, year={2017}, volume={75S}, pages={ S43-S53 } }
Figures and Tables from this paper
22 Citations
Survey on RNN and CRF models for de-identification of medical free text
- Computer ScienceJ. Big Data
- 2020
A comprehensive survey of work on automated free text de-identification with recurrent neural network (RNN) and conditional random field (CRF) approaches finds that RNN models, particularly long short-term memory (LSTM) algorithms, generally outperformed CRF models and also other systems, namely rule-based algorithms.
Redaction of Protected Health Information in EHRs using CRFs and Bi-directional LSTMs
- Computer Science2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)
- 2018
This paper proposes an efficient solution for redaction using two models, both of which achieve good F-scores for PHIs of all types and achieves a micro-F1 measure of 0.9592, which performs better than the CRF-based model.
A Short Survey of LSTM Models for De-identification of Medical Free Text
- Computer Science2020 IEEE 6th International Conference on Collaboration and Internet Computing (CIC)
- 2020
Performance-wise, LSTMs generally surpassed other types of models used in automated de-identification of free text, namely conditional random field (CRF) algorithms and rule-based algorithms, but hybrid or ensemble LSTM models did not outperform L STM -only models.
Window Classifiers and Conditional Random Fields for Medical Report De-Identification
- Computer ScienceIberLEF@SEPLN
- 2019
Two methods, a window classifier and an implementation of conditional random fields (CRF) are presented in order to de-identify personal information of Spanish medical records provided by the MEDDOCAN challenge.
Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
- Computer ScienceBMC Medical Informatics and Decision Making
- 2020
Good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI are demonstrated.
De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.
- PsychologyJournal of biomedical informatics
- 2017
De-identifying Hospital Discharge Summaries: An End-to-End Framework using Ensemble of Deep Learning Models
- Computer Science
- 2021
An end-to-end deidentification framework to automatically remove PII from hospital discharge summaries is presented and it is shown that the ensemble model combined using the stacking Support Vector Machine method on the three base-models with the best F1 scores achieved excellent results.
Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks
- Medicine, Computer ScienceYearbook of medical informatics
- 2018
There is a clear trend in using data-driven methods to tackle problems in clinical NLP by highlighting the tasks, the most effective methodologies used, the data, and the sharing strategies.
PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation
- Computer ScienceCLINICALNLP
- 2020
A simple yet effective data augmentation method PHICON, which creates augmented training corpora by replacing PHI entities with named-entities sampled from external sources, and by changing background context with synonym replacement or random word insertion, is proposed.
A study of deep learning methods for de-identification of clinical notes in cross-institute settings
- Computer ScienceBMC Medical Informatics and Decision Making
- 2019
Fine-tuning is a potential solution to re-use pre-trained parameters and reduce the training time to customize deep learning-based de-identification models trained using clinical corpus from a different institution.
References
SHOWING 1-10 OF 37 REFERENCES
De-identification of patient notes with recurrent neural networks
- Computer Science, MedicineJ. Am. Medical Informatics Assoc.
- 2017
The first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems, is introduced, which outperforms the state-of-the-art systems.
Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1
- PsychologyJ. Biomed. Informatics
- 2015
De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.
- PsychologyJournal of biomedical informatics
- 2017
Automatic detection of protected health information from clinic narratives
- Computer ScienceJ. Biomed. Informatics
- 2015
Research Paper: Rapidly Retargetable Approaches to De-identification in Medical Records
- Computer ScienceJ. Am. Medical Informatics Assoc.
- 2007
This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation, and developed a method for tuning the balance of recall vs. precision in the Carafe system.
Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
- Computer ScienceJ. Am. Medical Informatics Assoc.
- 2007
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified.
Can Physicians Recognize Their Own Patients in De-identified Notes?
- MedicineMIE
- 2014
The adoption of Electronic Health Records is growing at a fast pace, and this growth results in very large quantities of patient clinical information becoming available in electronic format, with…
Bidirectional LSTM-CRF Models for Sequence Tagging
- Computer ScienceArXiv
- 2015
This work is the first to apply a bidirectional LSTM CRF model to NLP benchmark sequence tagging data sets and it is shown that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a biddirectional L STM component.
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
- Computer ScienceACL
- 2016
A novel neutral network architecture is introduced that benefits from both word- and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF, thus making it applicable to a wide range of sequence labeling tasks.