# Finding scientific topics

@article{Griffiths2004FindingST, title={Finding scientific topics}, author={Thomas L. Griffiths and Mark Steyvers}, journal={Proceedings of the National Academy of Sciences of the United States of America}, year={2004}, volume={101}, pages={5228 - 5235} }

A first step in identifying the content of a document is determining which topics that document addresses. [... ] Key Method 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics… Expand

## 5,622 Citations

Probabilistic author-topic models for information discovery

- Computer ScienceKDD
- 2004

The methodology is applied to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and a model with 300 topics is learned using a Markov chain Monte Carlo algorithm.

A network approach to topic models

- Computer ScienceScience Advances
- 2018

A new approach to topic models finds topics through community detection in word-document networks by adapting existing community-detection methods using a stochastic block model with nonparametric priors, and shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.

The Author-Topic Model for Authors and Documents

- Computer ScienceUAI
- 2004

The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated.

Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document

- Computer Science2010 IEEE International Conference on Data Mining
- 2010

By taking into account the sequential structure within a document, the SeqLDA model has a higher fidelity over LDA in terms of perplexity (a standard measure of dictionary-based compressibility) and yields a nicer sequential topic structure than LDA.

Interpreting document collections with topic models

- Computer Science
- 2014

This thesis looks at the problem of identifying incoherent topics, and proposes novel methods for efficiently identifying semantically related topics which can be used for topic recommendation and proposes approaches that provide textual or image labels which assist in topic interpretability.

Detecting research topics via the correlation between graphs and texts

- Computer ScienceKDD '07
- 2007

This paper presents a unique approach that uses the correlation between the distribution of a term that represents a topic and the link distribution in the citation graph where the nodes are limited to the documents containing the term.

Learning author-topic models from text corpora

- Computer ScienceTOIS
- 2010

The interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors are discussed.

Structured Topic Models for Language

- Computer Science
- 2008

This thesis introduces new methods for statistically modelling text using topic models that combine latent topics with information about document structure, ranging from local sentence structure to inter-document relationships.

A correlated topic model of Science

- Computer Science
- 2007

The correlated topic model (CTM) is developed, where the topic proportions exhibit correlation via the logistic normal distribution, and it is demonstrated its use as an exploratory tool of large document collections.

Extracting Representative Words of a Topic Determined by Latent Dirichlet Allocation

- Computer Science
- 2014

Experimental results show that the proposed method to estimate representative words of each topic from an LDA result provides better information for interpreting a topic than LDA does.

## References

SHOWING 1-10 OF 22 REFERENCES

Unsupervised Learning by Probabilistic Latent Semantic Analysis

- Computer ScienceMachine Learning
- 2004

This paper proposes to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice, and results in a more principled approach with a solid foundation in statistical inference.

Monte Carlo Strategies in Scientific Computing

- Computer ScienceTechnometrics
- 2002

The strength of this book is in bringing together advanced Monte Carlo methods developed in many disciplines, including the Ising model, molecular structure simulation, bioinformatics, target tracking, hypothesis testing for astronomical observations, Bayesian inference of multilevel models, missing-data problems.

Expectation-Propogation for the Generative Aspect Model

- Computer ScienceUAI
- 2002

This paper demonstrates that the simple variational methods of Blei et al. (2001) can lead to inaccurate inferences and biased learning for the generative aspect model, and develops an alternative approach that leads to higher accuracy at comparable cost.

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

- PhysicsIEEE Transactions on Pattern Analysis and Machine Intelligence
- 1984

The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.

Monte Carlo Methods in Statistical Physics

- Computer Science
- 1999

This book provides an introduction to Monte Carlo simulations in classical statistical physics and is aimed both at students beginning work in the field and at more experienced researchers who wish…

Markov Chain Monte Carlo in Practice

- Computer Science
- 1997

The Markov Chain Monte Carlo Implementation Results Summary and Discussion MEDICAL MONITORING Introduction Modelling Medical Monitoring Computing Posterior Distributions Forecasting Model Criticism Illustrative Application Discussion MCMC for NONLINEAR HIERARCHICAL MODELS.

Foundations of statistical natural language processing

- Computer ScienceSGMD
- 2002

This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.

Fundamental theorem of natural selection under gene-culture transmission.

- BiologyProceedings of the National Academy of Sciences of the United States of America
- 1991

It is shown that cultural transmission has several important implications for the evolution of population fitness, most notably that there is a time lag in the response to selection such that the future evolution depends on the past selection history of the population.

In Advances in Neural Information Processing Systems

- BiologyNIPS 1990
- 1990

Bill Baird { Publications References 1] B. Baird. Bifurcation analysis of oscillating neural network model of pattern recognition in the rabbit olfactory bulb. In D. 3] B. Baird. Bifurcation analysis…

1997 IEEE Workshop on Automatic Speech Recognition and Understanding : proceedings

- Computer Science
- 1997

This workshop focuses on the recent progress and new ground-breaking paradigms of automatic speech recognition and understanding, with robust modeling as the main theme.