The Author-Topic Model for Authors and Documents

Abstract

We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a distribution over words rather than a distribution over topics. We show topics recovered by the authortopic model, and demonstrate applications to computing similarity between authors and entropy of author output.

Extracted Key Phrases

7 Figures and Tables

050100150'04'05'06'07'08'09'10'11'12'13'14'15'16'17
Citations per Year

1,169 Citations

Semantic Scholar estimates that this publication has 1,169 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{RosenZvi2004TheAM, title={The Author-Topic Model for Authors and Documents}, author={Michal Rosen-Zvi and Thomas L. Griffiths and Mark Steyvers and Padhraic Smyth}, booktitle={UAI}, year={2004} }