Finding scientific topics
- T. Griffiths, M. Steyvers
- Computer ScienceProceedings of the National Academy of Sciences…
- 6 April 2004
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
The Author-Topic Model for Authors and Documents
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, Padhraic Smyth
- Computer ScienceConference on Uncertainty in Artificial…
- 7 July 2004
The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated.
The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth
A simple model for semantic growth is described, in which each new word or concept is connected to an existing network by differentiating the connectivity pattern of an existing node, which generates appropriate small-world statistics and power-law connectivity distributions.
A model for recognition memory: REM—retrieving effectively from memory
A new model of recognition memory is reported. This model is placed within, and introduces, a more elaborate theory that is being developed to predict the phenomena of explicit and implicit, and…
Topics in semantic representation.
This article analyzes the abstract computational problem underlying the extraction and use of gist, formulating this problem as a rational statistical inference that leads to a novel approach to semantic representation in which word meanings are represented in terms of a set of probabilistic topics.
Probabilistic author-topic models for information discovery
- M. Steyvers, Padhraic Smyth, M. Rosen-Zvi, T. Griffiths
- Computer ScienceKnowledge Discovery and Data Mining
- 22 August 2004
The methodology is applied to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and a model with 300 topics is learned using a Markov chain Monte Carlo algorithm.
Probabilistic Topic Models
Integrating Topics and Syntax
This work presents a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or semantics beyond statistical dependency.
Inferring causal networks from observations and interventions
Learning author-topic models from text corpora
The interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors are discussed.