Joint latent topic models for text and citations


In this work, we address the problem of joint modeling of text and citations in the topic modeling framework. We present two different models called the Pairwise-Link-LDA and the Link-PLSA-LDA models. The Pairwise-Link-LDA model combines the ideas of LDA [4] and Mixed Membership Block Stochastic Models [1] and allows modeling arbitrary link structure. However, the model is computationally expensive, since it involves modeling the presence or absence of a citation (link) between every pair of documents. The second model solves this problem by assuming that the link structure is a bipartite graph. As the name indicates, Link-PLSA-LDA model combines the LDA and PLSA models into a single graphical model. Our experiments on a subset of Citeseer data show that both these models are able to predict unseen data better than the baseline model of Erosheva and Lafferty [8], by capturing the notion of topical similarity between the contents of the cited and citing documents. Our experiments on two different data sets on the link prediction task show that the Link-PLSA-LDA model performs the best on the citation prediction task, while also remaining highly scalable. In addition, we also present some interesting visualizations generated by each of the models.

DOI: 10.1145/1401890.1401957

Extracted Key Phrases

13 Figures and Tables

Citations per Year

368 Citations

Semantic Scholar estimates that this publication has 368 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Nallapati2008JointLT, title={Joint latent topic models for text and citations}, author={Ramesh Nallapati and Amr Ahmed and Eric P. Xing and William W. Cohen}, booktitle={KDD}, year={2008} }