Advanced forecasting of career choices for college students based on campus big data
Many collective human activities have been shown to exhibit universal patterns. However, the possibility of regularities underlying researcher migration in computer science (CS) has barely been explored at global scale. To a large extend, this is due to official and commercial records being restricted, incompatible between countries, and especially not registered across researchers. We overcome these limitations by building our own, transnational, large-scale dataset inferred from publicly available information on the Web. Essentially, we use Label Propagation (LP) to infer missing geo-tags of author-paper-pairs retrieved from online bibliographies. On this dataset, we then find statistical regularities that explain how researchers in CS move from one place to another. However, although vanilla LP is simple and has been remarkably successful, its run time can suffer from unexploited symmetries of the underlying graph. Consequently, we introduce compressed LP (CLP) that exploits these symmetries to reduce the dimensions of the matrix inverted by LP to obtain optimal labeling scores. We prove that CLP reaches identical labeling scores as LP, while often being significantly faster with lower memory usage.