• Corpus ID: 239050273

An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)

  title={An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)},
  author={Sijia Liu and Andrew Wen and Liwei Wang and Huan He and Sunyang Fu and Robert T. Miller and Andrew Williams and Daniel Harris and Ramakanth Kavuluru and Mei Liu and Noor Abu-El-Rub and Rui Zhang and John David Osborne and Masoud Rouhizadeh and Yongqun Oliver He and Emily R. Pfaff and Christopher G. Chute and Tim Q. Duong and Melissa A. Haendel and Rafael Fuentes and Peter Szolovits and Hua Xu and Hongfang Liu},
While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, Interpretability and usability. Built upon our previous work, in this study, we proposed an open natural language processing development framework and evaluated it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C). Based on the… 

Figures and Tables from this paper


Criteria2Query: a natural language interface to clinical databases for cohort definition
Criteria2Query is a natural language interface that facilitates human-computer collaboration for cohort definition and execution using clinical databases and supports fully automated and interactive modes for autonomous data-driven cohort definition by researchers with minimal human effort.
A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR
A framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models is described, which leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors.
Clinical information extraction applications: A literature review
There is a considerable gap between clinical studies using EHR data and studies using clinical IE, so a more concrete understanding of the gap is gained and potential solutions to bridge this gap are provided.
The Unified Medical Language System (UMLS): integrating biomedical terminology
The Unified Medical Language System is a repository of biomedical vocabularies developed by the US National Library of Medicine and includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap).
brat: a Web-based Tool for NLP-Assisted Text Annotation
The brat rapid annotation tool (BRAT) is introduced, an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology and an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15% decrease in total annotation time.
The Human Phenotype Ontology in 2021
Recent major extensions of the Human Phenotype Ontology for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas are presented and new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease are presented.
ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports
In this study, it is found that ConText obtains reasonable to good performance for negated, historical, and hypothetical conditions across all report types that contain such conditions.
The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment
The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics.
The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future
The evolution, accomplishments, opportunities, and challenges of the network are described, from its inception as a five- group consortium focused on genotype–phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting toward the implementation of genomic medicine.
The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species
The Monarch Initiative is a collaborative, open science effort that aims to semantically integrate genotype–phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration.