Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data

Abstract

Twitter offers scholars new ways to understand the dynamics of public opinion and social discussions. However, in order to understand such discussions, it is necessary to identify coherent topics that have been discussed in the tweets. To assess the coherence of topics, several automatic topic coherence metrics have been designed for classical document corpora. However, it is unclear how suitable these metrics are for topic models generated from Twitter datasets. In this paper, we use crowdsourcing to obtain pairwise user preferences of topical coherences and to determine how closely each of the metrics align with human preferences. Moreover, we propose two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics. We show that our proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.

DOI: 10.1007/978-3-319-30671-1_36

6 Figures and Tables

Cite this paper

@inproceedings{Fang2016TopicsIT, title={Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data}, author={Anjie Fang and Craig MacDonald and Iadh Ounis and Philip Habel}, booktitle={ECIR}, year={2016} }