## Figures and Tables from this paper

## 30,719 Citations

Matching Results of Latent Dirichlet Allocation for Text

- Computer Science
- 2012

This work presents a method to match and compare the resulting LDA topics of different models with light weight easy to use similarity measures, keeping the model inference simple and matching topics solely by their high probability word lists.

Kernel Topic Models

- Computer ScienceAISTATS
- 2012

An approximate algorithm cast around a Laplace approximation in a transformed basis is presented, which allows documents to be associated with elements of a Hilbert space, admitting kernel topic models (KTM), modelling temporal, spatial, hierarchical, social and other structure between documents.

Topic Models Conditioned on Relations

- Computer ScienceECML/PKDD
- 2010

A Dirichlet-multinomial nonparametric regression topic model that includes a Gaussian process prior on joint document and topic distributions that is a function of document relations is presented.

Sparse Additive Generative Models of Text

- Computer ScienceICML
- 2011

This approach has two key advantages: it can enforce sparsity to prevent overfitting, and it can combine generative facets through simple addition in log space, avoiding the need for latent switching variables.

Latent Gaussian Models for Topic Modeling

- Computer ScienceAISTATS
- 2014

A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a…

Undirected and Interpretable Continuous Topic Models of Documents

- Computer Science
- 2007

A new type of undirected graphical model suitable for topic modeling and dimensionality reduction for large text collections, which represents words using Discrete distributions akin to traditional ‘bag-of-words’ methods.

Supervised Topic Models

- Computer ScienceNIPS
- 2007

The supervised latent Dirichlet allocation (sLDA) model, a statistical model of labelled documents, is introduced, which derives a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations.

A Spectral Algorithm for Latent Dirichlet Allocation

- Computer ScienceAlgorithmica
- 2014

This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA).

Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression

- Computer ScienceUAI
- 2008

A Dirichlet-multinomial regression topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates is proposed.

Dirichlet Mixture Allocation for Multiclass Document Collections Modeling

- Computer Science2009 Ninth IEEE International Conference on Data Mining
- 2009

Experiments on the popular TDT2 Corpus demonstrate that DMA models a collection of documents more precisely than LDA when the documents are obtained from multiple classes.

## References

SHOWING 1-10 OF 69 REFERENCES

Modeling annotated data

- Computer ScienceSIGIR
- 2003

Three hierarchical probabilistic mixture models which aim to describe annotated data with multiple types, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type.

Markov Chain Sampling Methods for Dirichlet Process Mixture Models

- Mathematics
- 2000

Abstract This article reviews Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model and presents two new classes of methods. One new approach is to…

Expectation-Propogation for the Generative Aspect Model

- Computer ScienceUAI
- 2002

This paper demonstrates that the simple variational methods of Blei et al. (2001) can lead to inaccurate inferences and biased learning for the generative aspect model, and develops an alternative approach that leads to higher accuracy at comparable cost.

Finding scientific topics

- Computer ScienceProceedings of the National Academy of Sciences of the United States of America
- 2004

A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

Latent semantic indexing: a probabilistic analysis

- Computer SciencePODS '98
- 1998

It is proved that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance.

Modelling Heterogeneity With and Without the Dirichlet Process

- Mathematics
- 2001

We investigate the relationships between Dirichlet process (DP) based models and allocation models for a variable number of components, based on exchangeable distributions. It is shown that the DP…

Text Classification from Labeled and Unlabeled Documents using EM

- Computer ScienceMachine Learning
- 2004

This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.

Improving multi-class text classification with Naive Bayes

- Computer Science
- 2001

A theorem is given which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes and a statistics-based framework for text feature selection is developed.

A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS

- Mathematics
- 1991

Abstract : The parameter in a Bayesian nonparametric problem is the unknown distribution P of the observation X. A Bayesian uses a prior distribution for P, and after observing X, solves the…

The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

- Computer ScienceNIPS
- 2000

We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is based on a…