• Publications
  • Influence
Finding scientific topics
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. Expand
Infinite latent feature models and the Indian buffet process
We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior inExpand
The Author-Topic Model for Authors and Documents
The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated. Expand
Topics in semantic representation.
This article analyzes the abstract computational problem underlying the extraction and use of gist, formulating this problem as a rational statistical inference that leads to a novel approach to semantic representation in which word meanings are represented in terms of a set of probabilistic topics. Expand
Hierarchical Topic Models and the Nested Chinese Restaurant Process
A Bayesian approach is taken to generate an appropriate prior via a distribution on partitions that allows arbitrarily large branching factors and readily accommodates growing data collections. Expand
Learning Systems of Concepts with an Infinite Relational Model
A nonparametric Bayesian model is presented that discovers systems of related concepts and applies the approach to four problems: clustering objects and features, learning ontologies, discovering kinship systems, and discovering structure in political data. Expand
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
An application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Expand
The Indian Buffet Process: An Introduction and Review
A detailed derivation of this distribution is given, and its use as a prior in an infinite latent feature model in probabilistic models such as bipartite graphs in which the size of at least one class of nodes is unknown is unknown. Expand
Probabilistic author-topic models for information discovery
The methodology is applied to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and a model with 300 topics is learned using a Markov chain Monte Carlo algorithm. Expand