• Corpus ID: 235732187

Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence

  title={Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence},
  author={Alexander Miserlis Hoyle and Pranav Goel and Denis Peskov and Andrew Hian-Cheong and Jordan L. Boyd-Graber and Philip Resnik},
Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap : automated coherence, developed for classical models, has not been validated using human… 

Apples to Apples: A Systematic Evaluation of Topic Models

A selection of 9 topic modelling techniques from the state of the art reflecting a diversity of approaches to the task are presented, an overview of the different metrics used to compare their performance, and the challenges of conducting such a comparison are presented.

Discovering Interpretable Topics by Leveraging Common Sense Knowledge

The Common Sense Topic Model (CSTM) is introduced, a novel and efficient approach that augments clustering with knowledge extracted from the ConceptNet knowledge graph that shows superior affinity to human judgement.

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

BERTopic is presented, a topic model that extends the process of topic modeling by extracting coherent topic representation through the development of a class-based variation of TF-IDF.

A Joint Learning Approach for Semi-supervised Neural Topic Modeling

The Label-Indexed Neural Topic Model (LI-NTM) is introduced, which is, to the extent of the knowledge, the first effective upstream semi-supervised neural topic model.

No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works.

The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists

Toxic language can take many forms, from explicit hate speech to more subtle microaggressions. Within this space, models identifying transphobic language have largely focused on overt forms. However,

Nonparametric neural topic modeling for customer insight extraction about the tire industry

In the age of social media, customers have become opinion makers that share their experience. People interested in a product can reach for these reviews on the whole Internet, thus leading to



Topic Model or Topic Twaddle? Re-evaluating Semantic Interpretability Measures

These evaluations show that for some specialized collections, standard coherence measures may not inform the most appropriate topic model or the optimal number of topics, and current interpretability performance validation methods are challenged as a means to confirm model quality in the absence of ground truth data.

Automatic Evaluation of Local Topic Quality

This work proposes a task designed to elicit human judgments of token-level topic assignments and proposes a variety of automated metrics to evaluate topic models at a local level, showing that an evaluation based on the percent of topic switches correlates most strongly with human judgment of local topic quality.

Exploring the Space of Topic Coherence Measures

This work is the first to propose a framework that allows to construct existing word based coherence measures as well as new ones by combining elementary components, and shows that new combinations of components outperform existing measures with respect to correlation to human ratings.

In Search of Coherence and Consensus: Measuring the Interpretability of Statistical Topics

This work studies measures of interpretability and proposes to measure topic interpretability from two perspectives: topic coherence and topic consensus and suggests topic consensus that measures how well the results of a crowdsourcing approach matches those given categories of topics.

Re-Ranking Words to Improve Interpretability of Automatically Generated Topics

Close correlation between the results of the two evaluation approaches suggests that the automatic method proposed here could be used to evaluate re-ranking methods without the need for human judgements.

Reading Tea Leaves: How Humans Interpret Topic Models

New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

A novel attention mechanism which factors in topic-word distribution to enable the model to attend on relevant words that convey topic related cues is proposed, which results in ~9-15 percentage improvement over score of existing SOTA topic models in NPMI coherence on several benchmark datasets.

Copula Guided Neural Topic Modelling for Short Texts

This paper focuses on adapting the popular Auto-Encoding Variational Bayes based neural topic models to short texts, by exploring the Archimedean copulas to guide the estimated topic distributions derived from linear projected samples of re-parameterized posterior distributions.

Document Informed Neural Autoregressive Topic Models with Distributional Prior

Novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability, and applicability over 7 long-text and 8 short-text datasets from diverse domains are presented.