# Generalized Topic Modeling

@article{Blum2016GeneralizedTM, title={Generalized Topic Modeling}, author={Avrim Blum and Nika Haghtalab}, journal={ArXiv}, year={2016}, volume={abs/1611.01259} }

Recently there has been significant activity in developing algorithms with provable guarantees for topic modeling. In standard topic models, a topic (such as sports, business, or politics) is viewed as a probability distribution $\vec a_i$ over words, and a document is generated by first selecting a mixture $\vec w$ over topics, and then generating words i.i.d. from the associated mixture $A{\vec w}$. Given a large collection of such documents, the goal is to recover the topic vectors and then…

## One Citation

Algorithms for Generalized Topic Modeling

- Computer ScienceAAAI
- 2018

This work considers a broad generalization of the traditional topic modeling framework, where it no longer assumes that words are drawn i.i.d. and instead view a topic as a complex distribution over sequences of paragraphs, to learn a predictor that given a new document, accurately predicts its topic mixture, without learning the distributions explicitly.

## References

SHOWING 1-10 OF 26 REFERENCES

A provable SVD-based algorithm for learning topics in dominant admixture corpus

- Computer ScienceNIPS
- 2014

Under a more realistic assumption, a singular value decomposition (SVD) based algorithm with a crucial pre-processing step of thresholding, can provably recover the topics from a collection of documents drawn from Dominant admixtures.

A Spectral Algorithm for Latent Dirichlet Allocation

- Computer ScienceAlgorithmica
- 2014

This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA).

Combining labeled and unlabeled data with co-training

- Computer ScienceCOLT' 98
- 1998

A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.

Co-Training and Expansion: Towards Bridging Theory and Practice

- Computer ScienceNIPS
- 2004

A much weaker "expansion" assumption on the underlying data distribution is proposed, that is proved to be sufficient for iterative co-training to succeed given appropriately strong PAC-learning algorithms on each feature set, and that to some extent is necessary as well.

A Practical Algorithm for Topic Modeling with Provable Guarantees

- Computer ScienceICML
- 2013

This paper presents an algorithm for topic model inference that is both provable and practical and produces results comparable to the best MCMC implementations while running orders of magnitude faster.

Tensor decompositions for learning latent variable models

- Computer Science, MathematicsArXiv
- 2012

A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models.

Probabilistic Latent Semantic Analysis

- Computer ScienceUAI
- 1999

This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.

Settling the Polynomial Learnability of Mixtures of Gaussians

- Computer Science2010 IEEE 51st Annual Symposium on Foundations of Computer Science
- 2010

This paper gives the first polynomial time algorithm for proper density estimation for mixtures of k Gaussians that needs no assumptions on the mixture, and proves that such a dependence is necessary.

Disentangling Gaussians

- Computer ScienceCommun. ACM
- 2012

The conclusion is that the statistical complexity and computational complexity of this general problem is in every way polynomial except for the dependence on the number of Gaussians, which is necessarily exponential.