TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

@article{NewmanGriffis2021TextEssenceAT,
  title={TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora},
  author={Denis Newman-Griffis and Venkatesh Sivaraman and Adam Perer and Eric Fosler-Lussier and Harry Hochheiser},
  journal={Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting},
  year={2021},
  volume={2021},
  pages={
          106-115
        }
}
Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 62 REFERENCES
Evaluating the Stability of Embedding-based Word Similarities
TLDR
It is found that nearest-neighbor distances are highly sensitive to small changes in the training corpus for a variety of algorithms, and it is recommended that users never rely on single embedding models for distance calculations, but rather average over multiple bootstrap samples, especially for small corpora. Expand
Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora
TLDR
This work proposes an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word, and demonstrates its effectiveness in 9 different setups, considering different corpus splitting criteria. Expand
Diachronic word embeddings and semantic shifts: a survey
TLDR
This paper surveys the current state of academic research related to diachronic word embeddings and semantic shifts detection, and proposes several axes along which these methods can be compared, and outlines the main challenges before this emerging subfield of NLP. Expand
Visual exploration and comparison of word embeddings
TLDR
A visual analytics system is proposed to visually explore and compare word embeddings trained by different algorithms and corpora to understand the similarity and differences between word embedDings. Expand
MUCK: A toolkit for extracting and visualizing semantic dimensions of large text collections
TLDR
MUCK 1 is described, an open-source toolkit that addresses both of these problems through a distributed text processing engine with an interactive visualization interface. Expand
Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change
TLDR
A robust methodology for quantifying semantic change is developed by evaluating word embeddings against known historical changes and it is revealed that words that are more polysemous have higher rates of semantic change. Expand
Going Beyond T-SNE: Exposing whatlies in Text Embeddings
TLDR
This work introduces whatlies, an open source toolkit for visually inspecting word and sentence embeddings that offers support for many popular dimensionality reduction techniques as well as many interactive visualisations that can either be statically exported or shared via Jupyter notebooks. Expand
Statistically Significant Detection of Linguistic Change
TLDR
This meta-analysis approach constructs property time series of word usage, and then uses statistically sound change point detection algorithms to identify significant linguistic shifts. Expand
Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings
TLDR
This work presents a method for characterizing the usage patterns of clinical concepts among different document types, in order to capture semantic differences beyond the lexical level. Expand
TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis
TLDR
An enhanced, LDA-based topic analysis technique is introduced that automatically derives a set of topics to summarize a collection of documents and their content evolution over time and an effective visual metaphor is developed to transform abstract and often complex text summarization results into a comprehensible visual representation. Expand
...
1
2
3
4
5
...