TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

  title={TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora},
  author={Denis Newman-Griffis and Venkatesh Sivaraman and Adam Perer and Eric Fosler-Lussier and Harry Hochheiser},
  journal={Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting},
Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure… 

Figures and Tables from this paper


Evaluating the Stability of Embedding-based Word Similarities
It is found that nearest-neighbor distances are highly sensitive to small changes in the training corpus for a variety of algorithms, and it is recommended that users never rely on single embedding models for distance calculations, but rather average over multiple bootstrap samples, especially for small corpora.
Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora
This work proposes an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word, and demonstrates its effectiveness in 9 different setups, considering different corpus splitting criteria.
Diachronic word embeddings and semantic shifts: a survey
This paper surveys the current state of academic research related to diachronic word embeddings and semantic shifts detection, and proposes several axes along which these methods can be compared, and outlines the main challenges before this emerging subfield of NLP.
Visual exploration and comparison of word embeddings
A visual analytics system is proposed to visually explore and compare word embeddings trained by different algorithms and corpora to understand the similarity and differences between word embedDings.
MUCK: A toolkit for extracting and visualizing semantic dimensions of large text collections
MUCK 1 is described, an open-source toolkit that addresses both of these problems through a distributed text processing engine with an interactive visualization interface.
Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change
A robust methodology for quantifying semantic change is developed by evaluating word embeddings against known historical changes and it is revealed that words that are more polysemous have higher rates of semantic change.
Going Beyond T-SNE: Exposing whatlies in Text Embeddings
This work introduces whatlies, an open source toolkit for visually inspecting word and sentence embeddings that offers support for many popular dimensionality reduction techniques as well as many interactive visualisations that can either be statically exported or shared via Jupyter notebooks.
Statistically Significant Detection of Linguistic Change
This meta-analysis approach constructs property time series of word usage, and then uses statistically sound change point detection algorithms to identify significant linguistic shifts.
Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings
This work presents a method for characterizing the usage patterns of clinical concepts among different document types, in order to capture semantic differences beyond the lexical level.
TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis
An enhanced, LDA-based topic analysis technique is introduced that automatically derives a set of topics to summarize a collection of documents and their content evolution over time and an effective visual metaphor is developed to transform abstract and often complex text summarization results into a comprehensible visual representation.