#### Filter Results:

#### Publication Year

1987

2017

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

- David M. Blei, Andrew Y. Ng, Michael I. Jordan
- Journal of Machine Learning Research
- 2001

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying… (More)

a grant from Darpa in support of the CALO program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. Abstract We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture components… (More)

- David M. Blei, Michael I. Jordan
- SIGIR
- 2003

We consider the problem of modeling annotated data---data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical probabilistic mixture models which aim to describe such data, culminating in <i>correspondence latent Dirichlet allocation</i>, a latent… (More)

- Matthew D. Hoffman, David M. Blei, Francis R. Bach
- NIPS
- 2010

We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the… (More)

We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows… (More)

- David M. Blei, Jon D. McAuliffe
- NIPS
- 2007

We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we… (More)

- David M. Blei, Lawrence Carin, David B. Dunson
- IEEE Signal Processing Magazine
- 2010

Probabilistic topic modeling provides a suite of tools for the unsupervised analysis of large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. This analysis can be used for corpus exploration, document search, and a variety of prediction problems.
In… (More)

- Matthew D. Hoffman, David M. Blei, Chong Wang, John William Paisley
- Journal of Machine Learning Research
- 2013

We derive a stochastic optimization algorithm for mean field variational inference, which we call online variational inference. Our algorithm approximates the posterior distribution of a probabilistic model with hidden variables, and can handle large (or even streaming) data sets of observations. Let x = x 1:n be n observations, β be global hidden… (More)

- David M. Blei, John D. Lafferty
- ICML
- 2006

A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections. The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. Variational approximations based on Kalman filters and nonparametric wavelet regression are developed… (More)

- Chong Wang, David M. Blei
- KDD
- 2011

Researchers have access to large online archives of scientific articles. As a consequence, finding relevant papers has become more difficult. Newly formed online communities of researchers sharing citations provides a new way to solve this problem. In this paper, we develop an algorithm to recommend scientific articles to users of an online community. Our… (More)