• Corpus ID: 2990304

QuickUMLS: a Fast, Unsupervised Approach for Medical Concept Extraction

  title={QuickUMLS: a Fast, Unsupervised Approach for Medical Concept Extraction},
  author={Luca Soldaini},
Entity extraction is a fundamental step in many health informatics systems. In recent years, tools such as MetaMap and cTAKES have been widely used for medical concept extraction on medical literature and clinical notes; however, relatively little interest has been placed on their scalability to large datasets. In this work, we present QuickUMLS: a fast, unsupervised, approximate dictionary matching algorithm for medical concept extraction. The proposed method achieves similar precision and… 

Tables from this paper

Clinical named-entity recognition: A short comparison

The preliminary results demonstrate that BioPortal performs well when extracting disorder and drug and can provide clinical researchers with real-clinical insights into patient's health patterns and it may allow to create a first version of an annotated dataset.

Clinical Concept Extraction with Lexical Semantics to Support Automatic Annotation

The proposed methodology significantly improves the performance of concept extraction from unstructured clinical narratives by exploiting the linguistic and lexical semantic features and can ease the automatic annotation process of clinical data, which ultimately improves theperformance of supervised data-driven applications trained with these data.

A Simple Terminology-Based Approach to Clinical Entity Recognition

A proprietary large vocabulary and thesaurus that extends SNOMED CT, SNOMed CT itself and UMLS is used that uses historical data of clinical terms used in the EHR problem list as a basic approach to entity recognition and normalization in Spanish.

Clinical Phrase Mining with Language Models

Experimental results on the MIMIC-III dataset show that the proposed CliniPhrase method can outperform the current state-of-the-art techniques by up to 18% in terms of F1 measure while being very efficient (up to 48 times faster).

HPI-DHC @ BioASQ DisTEMIST: Spanish Biomedical Entity Linking with Pre-trained Transformers and Cross-lingual Candidate Retrieval

The goal of the task is to extract disease mentions from Spanish clinical case reports and map them to concepts in SNOMED CT and a detailed analysis of system performance highlights the importance of task-specific entity ranking and the benefits of cross-lingual candidate retrieval.

MedLinker: Medical Entity Linking with Neural Representations and Dictionary Matching

This paper explores how NLMs can be used for Medical Entity Linking with the recently introduced MedMentions dataset, and introduces a solution that performs competitively on semantic type linking, while improving the state-of-the-art on the more fine-grained task of concept linking.

Evaluation of Medical Concept Annotation Systems on Clinical Records

This paper analyses and evaluates four annotation systems for the task of extracting medical concepts from clinical free-text documents and finds the concept recognition component of each system was found to be highly sensitive to the quality of the text spans output by the concept extraction component of the annotation system.

Towards Verifying Results from Biomedical NLP Machine Learning Models Using the UMLS: Cases of Classification and Named Entity Recognition

This work presents a method that uses the ontologies and knowledge-bases in the Unified Medical Language System (UMLS) to verify and explain the output of biomedical ML models, and applies it to two tasks using textual cancer pathology reports.

A CNL-based Method for Detecting Disease Negation

This work investigated the use of a CNL with a general-purpose semantic parser to detect negation and identified three kinds of negation–explicit negation, implicitNegation, and explicit implicit negation.



Sophia: An Expedient UMLS Concept Extraction Annotator

Sophia, a rapid UMLS concept extraction annotator was developed to fulfill a mandate and address extraction where high throughput is needed while preserving performance, and is noted to be several fold faster than cTAKES and the scaled-out MetaMap service.

An overview of MetaMap: historical perspective and recent advances

This study reports on MetaMap's evolution over more than a decade, concentrating on those features arising out of the research needs of the biomedical informatics community both within and outside of the National Library of Medicine.

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text, and its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations.

MetaCoDe: A Lightweight UMLS Mapping Tool

A lightweight UMLS tagger is developed that processes large text collections at an acceptable speed, but at the cost of the sophistication of the treatments, allowing potential users to balance the gain in speed against the loss in quality.

State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the TREC 2014 CDS track

An overview of the task, a survey of the information retrieval methods employed by the participants, an analysis of the results, and a discussion on the future directions for this challenging yet important task are provided.

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text

The 2010 i2b2/VA Workshop on Natural Language Processing Challenges for Clinical Records presented three tasks, which showed that machine learning approaches could be augmented with rule-based systems to determine concepts, assertions, and relations.

Inferring conceptual relationships to improve medical records search

The results show the effectiveness of the approach to model the implicit knowledge in medical records search, whereby the infAP retrieval performance is significantly improved up to 14.43% over an effective concept-based representation baseline.

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

Overall, BioASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.

Temporal Annotation in the Clinical Domain

The implementation and extension of ISO-TimeML for annotating a corpus of clinical notes, known as the THYME corpus, is discussed and a new annotation guideline has been developed, “the THyME Guidelines to ISO- timeML (THYME-Time ML)”.

Simple and Efficient Algorithm for Approximate Dictionary Matching

This paper presents a simple and efficient algorithm for approximate dictionary matching designed for similarity measures such as cosine, Dice, Jaccard, and overlap coefficients, called CPMerge, for the τ-overlap join of inverted lists.