Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models.
Probabilistic topic modeling of text collections has been recently developed mainly within the framework of graphical models and Bayesian inference. In this paper we introduce an alternative semi-probabilistic approach, which we call additive regularization of topic models (ARTM). Instead of building a purely probabilistic generative model of text we… (More)
Problems 1. Modern Probabilistic Topic Models (PTM) theory is too complicated to make a model fast, robust, sparse, online, hierarchical, semi-supervised, multilingual, etc. all at once. 2. Dirichlet prior in Latent Dirichlet Allocation (LDA) is mathematically convenient but poorly motivated linguistically. 3. Sparsity vs. smoothness contradiction.… (More)
Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. Determining the optimal number of topics remains a challenging problem in topic modeling. We propose a simple entropy regularization for topic selection in terms of Additive Regularization of Topic Models (ARTM), a multicriteria approach for combining… (More)
Probabilistic topic modeling is a rapidly developing branch of statistical text analysis. Topic model uncovers a hidden thematic structure of the text collection. Learning a topic model from a document collection has an infinite set of solutions. The nonuniqueness results in a weak interpretability and an instability of the solution. To tackle these… (More)
In this paper we introduce a generalized learning algorithm for probabilistic topic models (PTM). Many known and new algorithms for PLSA, LDA, and SWB models can be obtained as its special cases by choosing a subset of the following " options " : regularization, sampling, update frequency, sparsing and robustness. We show that a robust topic model, which… (More)