Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1

@article{Stubbs2015AutomatedSF,
  title={Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1},
  author={A. Stubbs and Christopher Kotfila and {\"O}zlem Uzuner},
  journal={Journal of biomedical informatics},
  year={2015},
  volume={58 Suppl},
  pages={
          S11-9
        }
}
The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured four tracks. The first of these was the de-identification track focused on identifying protected health information (PHI) in longitudinal clinical narratives. The longitudinal nature of clinical narratives calls particular attention to details of information that, while benign on their own in separate records, can lead to identification of patients in combination in longitudinal records. Accordingly, the 2014 de… Expand
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus
TLDR
This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. Expand
The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records
OBJECTIVE The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task track 3, focused on medical concept normalization (MCN) in clinicalExpand
Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes.
TLDR
The results show that the integration of the proposed methods can identify Health Information Portability and Accountability Act (HIPAA) defined PHIs with overall F1-scores of ∼90% and above, yet, some classes (Profession, Organization) proved again to be challenging given the variability of expressions used to reference given information. Expand
De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.
TLDR
Overall, de-identification is still not a solved problem, though it is important to the future of clinical NLP, and unmodified existing systems do not generalize well to new data without the benefit of training data. Expand
The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge.
TLDR
This work participated in a shared task challenge by the Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDoC Individualized Domains (N-GRID) and explored new techniques such as disambiguation rules, term ambiguity measurement, and used multi-pass sieve framework at a micro level. Expand
De-identification of clinical notes via recurrent neural network and conditional random field.
TLDR
A hybrid system is developed that achieves the highest micro F1-scores under the "token, "strict" and "binary token" criteria respectively, ranking first in the 2016 CEGS N-GRID NLP challenge and outperforming other state-of-the-art systems. Expand
Performance of Automatic De-identification Across Different Note Types
TLDR
A state-of-the art de-id system called NeuroNER1 is presented on a diverse set of notes from University of Washington (UW) when the models are trained on data from an external institution (Partners Healthcare) vs. from the same institution ( UW). Expand
De-identification of patient notes with recurrent neural networks
TLDR
The first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems, is introduced, which outperforms the state-of-the-art systems. Expand
De-identification of medical records using conditional random fields and long short-term memory networks
TLDR
Two participating systems based on conditional random fields and long short-term memory networks are described, based on sentence detection and tokenization before de-identification of psychiatric evaluation records. Expand
A survey of automatic de-identification of longitudinal clinical narratives
Use of medical data, also known as electronic health records, in research helps develop and advance medical science. However, protecting patient confidentiality and identity while using medical dataExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus
TLDR
This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. Expand
Combining knowledge- and data-driven methods for de-identification of clinical narratives
TLDR
The overall results suggest that automated text mining methods can be used to reliably process clinical notes to identify personal information and thus providing a crucial step in large-scale de-identification of unstructured data for further clinical and epidemiological studies. Expand
Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2
The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives ofExpand
Large-scale evaluation of automated clinical note de-identification and its impact on information extraction
TLDR
NLP-based de-identification shows excellent performance that rivals the performance of human annotators and scales up to millions of documents quickly and inexpensively. Expand
Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
TLDR
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified. Expand
Automatic de-identification of electronic medical records using token-level and character-level conditional random fields
TLDR
This study proposes a hybrid system based on both machine learning and rule approaches for the de-identification track of the 2014 i2b2 clinical natural language processing (NLP) challenge, which achieves the highest micro F-scores under the "token, "strict" and "relaxed" criteria respectively. Expand
Automatic detection of protected health information from clinic narratives
TLDR
This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge and achieved promising accuracy on the challenge test data with an overall micro-averaged F-measure of 93.6%, which was the winner of this de- identification challenge. Expand
Creation of a new longitudinal corpus of clinical narratives
TLDR
This paper details the process used to select records for this corpus and provides an overview of novel research uses forThis corpus, the only annotated corpus of longitudinal clinical narratives currently available for research to the general research community. Expand
BoB, a best-of-breed automated text de-identification system for VHA clinical documents
TLDR
The authors' system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact. Expand
Automatic de-identification of textual documents in the electronic health record: a review of recent research
TLDR
A review of recent research in automated de-identification of narrative text documents from the electronic health record finds methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Expand
...
1
2
3
4
...