Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings

  title={Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings},
  author={Denis Newman-Griffis and Eric Fosler-Lussier},
Natural language processing techniques are being applied to increasingly diverse types of electronic health records, and can benefit from in-depth understanding of the distinguishing characteristics of medical document types. We present a method for characterizing the usage patterns of clinical concepts among different document types, in order to capture semantic differences beyond the lexical level. By training concept embeddings on clinical documents of different types and measuring the… 

Figures, Tables, and Topics from this paper

TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora
This work introduces TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings, and proposes a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-qualityembeddings for corpus analysis.
Medical Information Extraction in the Age of Deep Learning
The paradigm shift from (feature-engineered) ML to DNNs changes the fundamental methodological rules of the game for medical NLP and should also deeply influence other areas of medical informatics, either NLP- or non-NLP-based.
Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health
A framework for developing natural language processing technologies for automated coding of medical information in under-studied domains is presented, and its applicability is demonstrated through a case study on physical mobility function.


Sublanguage Analysis of Medical Weblogs
This paper analyses the language of medical blogs by means of a sublanguage analysis and concludes that methods for reference resolution and for relation extraction where the relation type does not need to be specified in advance are required.
Medical Concept Embeddings via Labeled Background Corpora
This paper investigates a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed.
Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews
Methods to refine the selection process of similar terms from multiple EMR-based word embeddings are presented, and their performance quantitatively and qualitatively across multiple chart review tasks are evaluated.
Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes
There are a relatively large number of sublanguages and variance both within and between document types and these findings will guide NLP development to create more customizable and generalizable solutions across medical domains and subl languages.
Leveraging Sublanguage Features for the Semantic Categorization of Clinical Terms
A method for the semantic categorization of clinical terms based on their surface form is presented and it is found that features based on sublanguage properties can provide valuable cues for the classification of term variants.
Robust Representation Learning of Biomedical Names
The idea behind the approach is to consider and encode contextual meaning, conceptual meaning, and the similarity between synonyms during the representation learning process, resulting in high practical utility in real-world applications.
Medical Semantic Similarity with a Neural Language Model
The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).
Exploring Diachronic Changes of Biomedical Knowledge using Distributed Concept Representations
This work examines the evolution in biomedical knowledge over time using scientific literature in terms of diachronic change, mainly the usage of temporal and distributional concept representations are explored and evaluated by a proof-of-concept.
Jointly Embedding Entities and Text with Distant Supervision
This work presents a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms.
Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change
A robust methodology for quantifying semantic change is developed by evaluating word embeddings against known historical changes and it is revealed that words that are more polysemous have higher rates of semantic change.