• Publications
  • Influence
Finding scientific topics
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
The Author-Topic Model for Authors and Documents
The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated.
Topics in semantic representation.
This article analyzes the abstract computational problem underlying the extraction and use of gist, formulating this problem as a rational statistical inference that leads to a novel approach to semantic representation in which word meanings are represented in terms of a set of probabilistic topics.
The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth
A simple model for semantic growth is described, in which each new word or concept is connected to an existing network by differentiating the connectivity pattern of an existing node, which generates appropriate small-world statistics and power-law connectivity distributions.
A model for recognition memory: REM—retrieving effectively from memory
A new model of recognition memory is reported. This model is placed within, and introduces, a more elaborate theory that is being developed to predict the phenomena of explicit and implicit, and
Probabilistic Topic Models
Probabilistic author-topic models for information discovery
The methodology is applied to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and a model with 300 topics is learned using a Markov chain Monte Carlo algorithm.
Integrating Topics and Syntax
This work presents a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or semantics beyond statistical dependency.
Learning author-topic models from text corpora
The interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors are discussed.
Statistical topic models for multi-label document classification
The experimental results indicate that probabilistic generative models can achieve competitive multi-label classification performance compared to discriminative methods, and have advantages for datasets with many labels and skewed label frequencies.