De-identification of medical records using conditional random fields and long short-term memory networks

@article{Jiang2017DeidentificationOM,
  title={De-identification of medical records using conditional random fields and long short-term memory networks},
  author={Zhipeng Jiang and Chao Zhao and Bin He and Yi Guan and Jingchi Jiang},
  journal={Journal of biomedical informatics},
  year={2017},
  volume={75S},
  pages={
          S43-S53
        }
}
Survey on RNN and CRF models for de-identification of medical free text
TLDR
A comprehensive survey of work on automated free text de-identification with recurrent neural network (RNN) and conditional random field (CRF) approaches finds that RNN models, particularly long short-term memory (LSTM) algorithms, generally outperformed CRF models and also other systems, namely rule-based algorithms.
Redaction of Protected Health Information in EHRs using CRFs and Bi-directional LSTMs
TLDR
This paper proposes an efficient solution for redaction using two models, both of which achieve good F-scores for PHIs of all types and achieves a micro-F1 measure of 0.9592, which performs better than the CRF-based model.
A Short Survey of LSTM Models for De-identification of Medical Free Text
TLDR
Performance-wise, LSTMs generally surpassed other types of models used in automated de-identification of free text, namely conditional random field (CRF) algorithms and rule-based algorithms, but hybrid or ensemble LSTM models did not outperform L STM -only models.
Window Classifiers and Conditional Random Fields for Medical Report De-Identification
TLDR
Two methods, a window classifier and an implementation of conditional random fields (CRF) are presented in order to de-identify personal information of Spanish medical records provided by the MEDDOCAN challenge.
Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
TLDR
Good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI are demonstrated.
De-identifying Hospital Discharge Summaries: An End-to-End Framework using Ensemble of Deep Learning Models
TLDR
An end-to-end deidentification framework to automatically remove PII from hospital discharge summaries is presented and it is shown that the ensemble model combined using the stacking Support Vector Machine method on the three base-models with the best F1 scores achieved excellent results.
Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks
TLDR
There is a clear trend in using data-driven methods to tackle problems in clinical NLP by highlighting the tasks, the most effective methodologies used, the data, and the sharing strategies.
PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation
TLDR
A simple yet effective data augmentation method PHICON, which creates augmented training corpora by replacing PHI entities with named-entities sampled from external sources, and by changing background context with synonym replacement or random word insertion, is proposed.
A study of deep learning methods for de-identification of clinical notes in cross-institute settings
TLDR
Fine-tuning is a potential solution to re-use pre-trained parameters and reduce the training time to customize deep learning-based de-identification models trained using clinical corpus from a different institution.
...
...

References

SHOWING 1-10 OF 37 REFERENCES
CRFs based de-identification of medical records
De-identification of patient notes with recurrent neural networks
TLDR
The first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems, is introduced, which outperforms the state-of-the-art systems.
Automatic detection of protected health information from clinic narratives
Research Paper: Rapidly Retargetable Approaches to De-identification in Medical Records
TLDR
This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation, and developed a method for tuning the balance of recall vs. precision in the Carafe system.
Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
TLDR
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified.
Can Physicians Recognize Their Own Patients in De-identified Notes?
The adoption of Electronic Health Records is growing at a fast pace, and this growth results in very large quantities of patient clinical information becoming available in electronic format, with
Bidirectional LSTM-CRF Models for Sequence Tagging
TLDR
This work is the first to apply a bidirectional LSTM CRF model to NLP benchmark sequence tagging data sets and it is shown that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a biddirectional L STM component.
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
TLDR
A novel neutral network architecture is introduced that benefits from both word- and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF, thus making it applicable to a wide range of sequence labeling tasks.
...
...