Latent Dirichlet Allocation

@article{Blei2003LatentDA,
  title={Latent Dirichlet Allocation},
  author={David M. Blei and A. Ng and Michael I. Jordan},
  journal={J. Mach. Learn. Res.},
  year={2003},
  volume={3},
  pages={993-1022}
}
Matching Results of Latent Dirichlet Allocation for Text
TLDR
This work presents a method to match and compare the resulting LDA topics of different models with light weight easy to use similarity measures, keeping the model inference simple and matching topics solely by their high probability word lists.
Kernel Topic Models
TLDR
An approximate algorithm cast around a Laplace approximation in a transformed basis is presented, which allows documents to be associated with elements of a Hilbert space, admitting kernel topic models (KTM), modelling temporal, spatial, hierarchical, social and other structure between documents.
Topic Models Conditioned on Relations
TLDR
A Dirichlet-multinomial nonparametric regression topic model that includes a Gaussian process prior on joint document and topic distributions that is a function of document relations is presented.
Sparse Additive Generative Models of Text
TLDR
This approach has two key advantages: it can enforce sparsity to prevent overfitting, and it can combine generative facets through simple addition in log space, avoiding the need for latent switching variables.
Latent Gaussian Models for Topic Modeling
A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a
Undirected and Interpretable Continuous Topic Models of Documents
TLDR
A new type of undirected graphical model suitable for topic modeling and dimensionality reduction for large text collections, which represents words using Discrete distributions akin to traditional ‘bag-of-words’ methods.
Supervised Topic Models
TLDR
The supervised latent Dirichlet allocation (sLDA) model, a statistical model of labelled documents, is introduced, which derives a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations.
A Spectral Algorithm for Latent Dirichlet Allocation
TLDR
This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA).
Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression
TLDR
A Dirichlet-multinomial regression topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates is proposed.
Dirichlet Mixture Allocation for Multiclass Document Collections Modeling
  • Wei Bian, D. Tao
  • Computer Science
    2009 Ninth IEEE International Conference on Data Mining
  • 2009
TLDR
Experiments on the popular TDT2 Corpus demonstrate that DMA models a collection of documents more precisely than LDA when the documents are obtained from multiple classes.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 69 REFERENCES
Modeling annotated data
TLDR
Three hierarchical probabilistic mixture models which aim to describe annotated data with multiple types, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type.
Markov Chain Sampling Methods for Dirichlet Process Mixture Models
Abstract This article reviews Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model and presents two new classes of methods. One new approach is to
Expectation-Propogation for the Generative Aspect Model
TLDR
This paper demonstrates that the simple variational methods of Blei et al. (2001) can lead to inaccurate inferences and biased learning for the generative aspect model, and develops an alternative approach that leads to higher accuracy at comparable cost.
Finding scientific topics
  • T. Griffiths, M. Steyvers
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2004
TLDR
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Latent semantic indexing: a probabilistic analysis
TLDR
It is proved that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance.
Modelling Heterogeneity With and Without the Dirichlet Process
We investigate the relationships between Dirichlet process (DP) based models and allocation models for a variable number of components, based on exchangeable distributions. It is shown that the DP
Text Classification from Labeled and Unlabeled Documents using EM
TLDR
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.
Improving multi-class text classification with Naive Bayes
TLDR
A theorem is given which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes and a statistics-based framework for text feature selection is developed.
A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS
Abstract : The parameter in a Bayesian nonparametric problem is the unknown distribution P of the observation X. A Bayesian uses a prior distribution for P, and after observing X, solves the
The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is based on a
...
1
2
3
4
5
...