Latent Dirichlet Allocation

Abstract

We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hofmann's aspect model , also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

View Slides

Extracted Key Phrases

010002000'02'04'06'08'10'12'14'16'18
Citations per Year

16,100 Citations

Semantic Scholar estimates that this publication has 16,100 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Blei2003LatentDA, title={Latent Dirichlet Allocation}, author={David M. Blei and Andrew Y. Ng and Michael I. Jordan}, booktitle={Journal of Machine Learning Research}, year={2003} }