Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models.
Probabilistic topic modeling of text collections has been recently developed mainly within the framework of graphical models and Bayesian inference. In this paper we introduce an alternative semi-probabilistic approach, which we call additive regularization of topic models (ARTM). Instead of building a purely probabilistic generative model of text we… (More)
Problems 1. Modern Probabilistic Topic Models (PTM) theory is too complicated to make a model fast, robust, sparse, online, hierarchical, semi-supervised, multilingual, etc. all at once. 2. Dirichlet prior in Latent Dirichlet Allocation (LDA) is mathematically convenient but poorly motivated linguistically. 3. Sparsity vs. smoothness contradiction.… (More)
Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. Determining the optimal number of topics remains a challenging problem in topic modeling. We propose a simple entropy regularization for topic selection in terms of Additive Regularization of Topic Models (ARTM), a multicriteria approach for combining… (More)