EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts

  title={EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts},
  author={Irene Li and Keen You and Xiangru Tang and Yujie Qiao and Lucas Huang and Chia-Chun Hsieh and Benjamin Rosand and Dragomir Radev},
The Electronic Health Record (EHR) is an essential part of the modern medical system and impacts healthcare delivery, operations, and research. Unstructured text is attracting much attention despite structured information in the EHRs and has become an exciting research field. The success of the recent neural Natural Language Processing (NLP) method has led to a new direction for processing unstructured clinical notes. In this work, we create a python library for clinical texts, EHRKit. This… 

Figures and Tables from this paper



Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

MedspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text, is introduced.

SciFive: a text-to-text transformer model for biomedical literature

The SciFive model outperforms the current SOTA methods on tasks in named entity relation, relation extraction, natural language inference, and questionanswering and shows that text-generation methods have significant potential in a broad array of biomedical NLP tasks, particularly those requiring longer, more complex outputs.

Biomedical and clinical English model packages for the Stanza Python NLP library

The study introduces biomedical and clinical NLP packages built for the Stanza library, which offer performance that is similar to the state of the art, and is also optimized for ease of use.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

This work proposes the first model for abstractive summarization of single, longer-form documents (e.g., research papers), consisting of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary.

Mimic-cxr database

  • PhysioNet10, 13026:C2JT1Q.
  • 2019

D 1 . 2 : Report on Improving Translation with Monolingual Data

We train a neural machine translation (NMT) system to both translate sourcelanguage text and copy target-language text, thereby exploiting monolingual corpora in the target language. Specifically, we

Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review

Efficient Variational Graph Autoencoders for Unsupervised Cross-domain Prerequisite Chains

Domain-Adversarial Variational Graph Autoencoders (DAVGAE) is introduced to solve this cross-domain prerequisite chain learning task efficiently and outperforms recent graph-based benchmarks while using only 1/10 of graph scale and 1/3 computation time.

Improving Cross-lingual Text Classification with Zero-shot Instance-Weighting

This paper proposes zero-shot instance-weighting, a general model-agnostic zero- shot learning framework for improving CLTC by leveraging source instance weighting, which adds a module on top of pre-trained language models for similarity computation of instance weights, thus aligning each source instance to the target language.