CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes

  title={CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes},
  author={J. Mullenbach and Yada Pruksachatkun and Sean Adler and Jennifer M. Seale and Jordan Swartz and T. Greg McKelvey and Hui Dai and Yi Yang and David A. Sontag},
Continuity of care is crucial to ensuring positive health outcomes for patients discharged from an inpatient hospital setting, and improved information sharing can help. To share information, caregivers write discharge notes containing action items to share with patients and their future caregivers, but these action items are easily lost due to the lengthiness of the documents. In this work, we describe our creation of a dataset of clinical action items annotated over MIMIC-III, the largest… 

Figures and Tables from this paper

Structured Understanding of Assessment and Plans in Clinical Documentation

A dataset containing annotations of 579 admission and progress notes from the publicly available and de-identified MIMIC-III ICU dataset with over 30,000 labels identifying active problems, their assessment, and the category of associated action items is described and released.

Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

An annotated corpus based on a large collection of publicly available daily progress notes, a type of EHR that is time-sensitive, problem-oriented, and well-documented by the format of Subjective, Objective, Assessment and Plan (SOAP).

A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing

A scoping review of papers on clinical natural language processing shared tasks that use publicly available electronic health record data from a cohort of patients to identify gaps with divergent interests between the general domain NLP community and the clinical informatics community for task motivation and design and in generalizability of the data sources.



A Novel System for Extractive Clinical Note Summarization using EHR Data

This paper presents their clinical note processing pipeline, which extends beyond basic medical natural language processing (NLP) with concept recognition and relation detection to also include components specific to EHR data, such as structured data associated with the encounter, sentence-level clinical aspects, and structures of the clinical notes.

Extracting medication information from clinical text

Although rule-based systems dominated the top 10, the best performing system was a hybrid and durations and reasons were the most difficult for all systems to detect.

2018 N2c2 Shared Task on Adverse Drug Events and Medication Extraction in Electronic Health Records

This challenge shows that clinical concept extraction and relation classification systems have a high performance for many concept types, but significant improvement is still required for ADEs and Reasons.

Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0)

MADE results show that recent progress in NLP has led to remarkable improvements in NER and RI tasks for the clinical domain, however, some room for improvement remains, particularly in the NER-RI task.

Natural language processing to extract follow-up provider information from hospital discharge summaries.

A NLP program had physician-like performance at extracting provider follow- up information from discharge summaries, and was as good as all physician-reviewers in identifying follow-up provider names and phone/fax numbers, and slightly inferior to two physicians at identifying location information.

Phenotyping of Clinical Notes with Improved Document Classification Models Using Contextualized Neural Language Models

Several architectures for modeling pheno-typing that rely solely on BERT representations of the clinical note are explored, finding these architectures are competitive with or outperform existing state of the art methods on two phenotyping tasks.

Deidentification of free-text medical records using pre-trained bidirectional transformers

This paper develops and evaluates an approach for deidentification of clinical notes based on a bidirectional transformer model, and proposes human interpretable evaluation measures and demonstrates state of the art performance against modern baseline models.

emrQA: A Large Corpus for Question Answering on Electronic Medical Records

A novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks is proposed and an instance of this methodology is demonstrated in generating a large- scale QA dataset for electronic medical records.

Unsupervised Pseudo-Labeling for Extractive Summarization on Electronic Health Records

This work studied how to utilize the intrinsic correlation between multiple EHRs to generate pseudo-labels and train a supervised model with no external annotation that is effective in summarizing crucial disease-specific information for patients.

Evaluating temporal relations in clinical text: 2012 i2b2 Challenge

A corpus of discharge summaries annotated with temporal information was provided to be used for the development and evaluation of temporal reasoning systems, and the best systems overwhelmingly adopted a rule based approach for value normalization.