A Study of Academic Collaborations in Computational Linguistics using a Latent Mixture of Authors Model

Abstract

Academic collaboration has often been at the forefront of scientific progress, whether amongst prominent established researchers, or between students and advisors. We suggest a theory of the different types of academic collaboration, and use topic models to computationally identify these in Computational Linguistics literature. A set of author-specific topics are learnt over the ACL corpus, which ranges from 1965 to 2009. The models are trained on a per year basis, whereby only papers published up until a given year are used to learn that year’s author topics. To determine the collaborative properties of papers, we use, as a metric, a function of the cosine similarity score between a paper’s term vector and each author’s topic signature in the year preceding the paper’s publication. We apply this metric to examine questions on the nature of collaborations in Computational Linguistics research, finding that significant variations exist in the way people collaborate within different subfields.

Extracted Key Phrases

8 Figures and Tables

Cite this paper

@inproceedings{Johri2011ASO, title={A Study of Academic Collaborations in Computational Linguistics using a Latent Mixture of Authors Model}, author={Nikhil Johri and Daniel Ramage and Daniel A. McFarland and Daniel Jurafsky}, booktitle={LaTeCH@ACL}, year={2011} }