Classifying Unstructured Clinical Notes via Automatic Weak Supervision

  title={Classifying Unstructured Clinical Notes via Automatic Weak Supervision},
  author={Chufan Gao and Mononito Goswami and Jieshi Chen and Artur Dubrawski},
Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients’ diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming but also costly and error-prone. Prior work demonstrated potential utility of Machine Learning (ML… 

Figures and Tables from this paper



FasTag: Automatic text classification of unstructured medical narratives

This retrospective study aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation.

Multi-Label Classification of Patient Notes a Case Study on ICD Code Assignment

HA-GRU, a hierarchical approach to tag a document by identifying the sentences relevant for each label achieves state-of-the art results and highlights the model decision process, allows easier error analysis, and suggests future directions for improvement.

Multimodal Machine Learning for Automated ICD Coding

Two separate machine learning models that can handle data from different modalities, including unstructured text, semi-structuring text and structured tabular data are developed and an ensemble method to integrate all modality-specific models to generate ICD-10 codes is employed.

Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation.

A clinical text classification paradigm using weak supervision and deep representation

It is shown that word embeddings significantly outperform tf-idf and topic modeling features in the paradigm, and that CNN captures additional patterns from the weak supervision compared to the rule-based NLP algorithms.

Automatic ICD Code Classification with Label Description Attention Mechanism

A model which utilizes a BERT-like encoder and word-level attention mechanism between input clinical cases and textual descriptions of the labels is developed, which found that it predicted a wider variety of codes across the test set than the authors' baseline, thereby capturing more low-resource labels.

ICD Code Retrieval: Novel Approach for Assisted Disease Classification

This paper presents a novel incremental approach to clinical Text Classification, which overcomes the low accuracy problem through the top-K retrieval, exploits Transfer Learning techniques in order to expand a skewed dataset and improves the overall accuracy over time, learning from user selection.

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets

The Biomedical Language Understanding Evaluation (BLUE) benchmark is introduced to facilitate research in the development of pre-training language representations in the biomedicine domain and it is found that the BERT model pre-trained on PubMed abstracts and MIMIC-III clinical notes achieves the best results.

Text Classification Using Label Names Only: A Language Model Self-Training Approach

This paper uses pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification, and achieves around 90% accuracy on four benchmark datasets.