Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization
@inproceedings{Vorontsov2014TutorialOP, title={Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization}, author={Konstantin V. Vorontsov and Anna Potapenko}, booktitle={International Joint Conference on the Analysis of Images, Social Networks and Texts}, year={2014} }
Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models.
60 Citations
Additive Regularization of Topic Models for Topic Selection and Sparse Factorization
- Computer ScienceSLDS
- 2015
A simple entropy regularization for topic selection in terms of Additive Regularization of Topic Models (ARTM) is proposed, a multicriteria approach for combining regularizers.
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large Collections
- Computer ScienceTM@CIKM
- 2015
The ability of non-Bayesian regularization to combine modalities, languages and multiple criteria to find sparse, diverse, and interpretable topics is demonstrated.
STABILITY OF TOPIC MODELING VIA MODALITY REGULARIZATION
- Computer Science
- 2020
This paper considers the use of additional information in the context of the stability problem of topic modeling, and shows that using side information as an additional modality improves topics stability without significant quality loss of the model.
Fast and modular regularized topic modelling
- Computer Science2017 21st Conference of Open Innovations Association (FRUCT)
- 2017
A non-Bayesian multiobjective approach called the Additive Regularization of Topic Models (ARTM) is developed, based on regularized Maximum Likelihood Estimation (MLE), and it is shown that many of the well-known Bayesian topic models can be re-formulated in a much simpler way using the regularization point of view.
Analyzing the Influence of Hyper-parameters and Regularizers of Topic Modeling in Terms of Renyi Entropy
- Computer ScienceEntropy
- 2020
This paper proposes a novel approach for analyzing the influence of different regularization types on results of topic modeling and concludes that regularization may introduce unpredictable distortions into topic models that need further research.
Convergence of the Algorithm of Additive Regularization of Topic Models
- Computer Science, MathematicsProceedings of the Steklov Institute of Mathematics
- 2021
A modification of the algorithm is proposed that improves the convergence without additional time and memory costs and both accelerates the convergence and improves the value of the criterion to be optimized.
Additive Regularization for Topic Modeling in Sociological Studies of User-Generated Texts
- Computer ScienceMICAI
- 2016
It is shown with human evaluations that ARTM is better for mining topics on specific subjects, finding more relevant topics of higher or comparable quality than developing LDA extensions.
BigARTM: Open Source Library for Regularized Multimodal Topic Modeling of Large Collections
- Computer ScienceAIST
- 2015
The BigARTM open source project is announced for regularized multimodal topic modeling of large collections and several experiments on Wikipedia corpus show that BigartM performs faster and gives better perplexity comparing to other popular packages, such as Vowpal Wabbit and Gensim.
Using Topic Modeling to Improve the Quality of Age-Based Text Classification
- Computer Science
- 2021
This paper formulated this problem as a binary classification task and developed a topic-informed machine learning classifier for resolving this problem, and compared three common topic modeling techniques to obtain document topic distribution vectors.
References
SHOWING 1-10 OF 35 REFERENCES
Additive regularization for topic models of text collections
- Computer Science
- 2014
A probabilistic topicmodel identifies the topic of a text collection, describing each topic by a discrete distribution over a set of words and each document by a continuous distribution overA set of topics.
Probabilistic Topic Models
- Computer ScienceIEEE Signal Processing Magazine
- 2010
In this article, we review probabilistic topic models: graphical models that can be used to summarize a large collection of documents with a smaller number of distributions over words. Those…
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation
- Computer ScienceNIPS
- 2006
This paper proposes the collapsed variational Bayesian inference algorithm for LDA, and shows that it is computationally efficient, easy to implement and significantly more accurate than standard variationalBayesian inference for L DA.
On Smoothing and Inference for Topic Models
- Computer ScienceUAI
- 2009
Using the insights gained from this comparative study, it is shown how accurate topic models can be learned in several seconds on text corpora with thousands of documents.
Sparse Additive Generative Models of Text
- Computer ScienceICML
- 2011
This approach has two key advantages: it can enforce sparsity to prevent overfitting, and it can combine generative facets through simple addition in log space, avoiding the need for latent switching variables.
Text Categorization Based on Topic Model
- Computer ScienceRSKT
- 2008
LDACLM or Latent Dirichlet Allocation Category Language Model is proposed for text categorization and estimate parameters of models by variational inference and shows LDACLM model to be effective for text classification, outperforming standard Naive Bayes and Rocchio method.
Knowledge discovery through directed probabilistic topic models: a survey
- Computer ScienceFrontiers of Computer Science in China
- 2009
This paper surveys an important subclass Directed Probabilistic Topic Models (DPTMs) with soft clustering abilities and their applications for knowledge discovery in text corpora, giving basic concepts, advantages and disadvantages in a chronological order.
Topic-weak-correlated Latent Dirichlet allocation
- Computer Science2010 7th International Symposium on Chinese Spoken Language Processing
- 2010
Experimental results on both synthetic and real-world corpus show the superiority of the TWC-LDA over the basic LDA for semantically meaningful topic discovery and document classification.
Robust PLSA Performs Better Than LDA
- Computer ScienceECIR
- 2013
It is shown that a robust topic model, which distinguishes specific, background and topic terms, doesn't need Dirichlet regularization and provides controllably sparse solution.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
- Computer ScienceNIPS
- 2009
An efficient Gibbs sampler for the sparseTM is developed that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components and will show that sparseTMs give better predictive performance with simpler inferred models.