Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

  title={Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization},
  author={Konstantin V. Vorontsov and Anna Potapenko},
  booktitle={International Joint Conference on the Analysis of Images, Social Networks and Texts},
Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models. 

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

A simple entropy regularization for topic selection in terms of Additive Regularization of Topic Models (ARTM) is proposed, a multicriteria approach for combining regularizers.

Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large Collections

The ability of non-Bayesian regularization to combine modalities, languages and multiple criteria to find sparse, diverse, and interpretable topics is demonstrated.


This paper considers the use of additional information in the context of the stability problem of topic modeling, and shows that using side information as an additional modality improves topics stability without significant quality loss of the model.

Fast and modular regularized topic modelling

A non-Bayesian multiobjective approach called the Additive Regularization of Topic Models (ARTM) is developed, based on regularized Maximum Likelihood Estimation (MLE), and it is shown that many of the well-known Bayesian topic models can be re-formulated in a much simpler way using the regularization point of view.

Analyzing the Influence of Hyper-parameters and Regularizers of Topic Modeling in Terms of Renyi Entropy

This paper proposes a novel approach for analyzing the influence of different regularization types on results of topic modeling and concludes that regularization may introduce unpredictable distortions into topic models that need further research.

Convergence of the Algorithm of Additive Regularization of Topic Models

A modification of the algorithm is proposed that improves the convergence without additional time and memory costs and both accelerates the convergence and improves the value of the criterion to be optimized.

Additive Regularization for Topic Modeling in Sociological Studies of User-Generated Texts

It is shown with human evaluations that ARTM is better for mining topics on specific subjects, finding more relevant topics of higher or comparable quality than developing LDA extensions.

BigARTM: Open Source Library for Regularized Multimodal Topic Modeling of Large Collections

The BigARTM open source project is announced for regularized multimodal topic modeling of large collections and several experiments on Wikipedia corpus show that BigartM performs faster and gives better perplexity comparing to other popular packages, such as Vowpal Wabbit and Gensim.

Using Topic Modeling to Improve the Quality of Age-Based Text Classification

This paper formulated this problem as a binary classification task and developed a topic-informed machine learning classifier for resolving this problem, and compared three common topic modeling techniques to obtain document topic distribution vectors.



Additive regularization for topic models of text collections

A probabilistic topicmodel identifies the topic of a text collection, describing each topic by a discrete distribution over a set of words and each document by a continuous distribution overA set of topics.

Probabilistic Topic Models

  • D. Blei
  • Computer Science
    IEEE Signal Processing Magazine
  • 2010
In this article, we review probabilistic topic models: graphical models that can be used to summarize a large collection of documents with a smaller number of distributions over words. Those

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

This paper proposes the collapsed variational Bayesian inference algorithm for LDA, and shows that it is computationally efficient, easy to implement and significantly more accurate than standard variationalBayesian inference for L DA.

On Smoothing and Inference for Topic Models

Using the insights gained from this comparative study, it is shown how accurate topic models can be learned in several seconds on text corpora with thousands of documents.

Sparse Additive Generative Models of Text

This approach has two key advantages: it can enforce sparsity to prevent overfitting, and it can combine generative facets through simple addition in log space, avoiding the need for latent switching variables.

Text Categorization Based on Topic Model

LDACLM or Latent Dirichlet Allocation Category Language Model is proposed for text categorization and estimate parameters of models by variational inference and shows LDACLM model to be effective for text classification, outperforming standard Naive Bayes and Rocchio method.

Knowledge discovery through directed probabilistic topic models: a survey

This paper surveys an important subclass Directed Probabilistic Topic Models (DPTMs) with soft clustering abilities and their applications for knowledge discovery in text corpora, giving basic concepts, advantages and disadvantages in a chronological order.

Topic-weak-correlated Latent Dirichlet allocation

  • Yi-Shiuan TanZhijian Ou
  • Computer Science
    2010 7th International Symposium on Chinese Spoken Language Processing
  • 2010
Experimental results on both synthetic and real-world corpus show the superiority of the TWC-LDA over the basic LDA for semantically meaningful topic discovery and document classification.

Robust PLSA Performs Better Than LDA

It is shown that a robust topic model, which distinguishes specific, background and topic terms, doesn't need Dirichlet regularization and provides controllably sparse solution.

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

An efficient Gibbs sampler for the sparseTM is developed that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components and will show that sparseTMs give better predictive performance with simpler inferred models.