Empirical study of topic modeling in Twitter


Social networks such as Facebook, LinkedIn, and Twitter have been a crucial source of information for a wide spectrum of users. In Twitter, popular information that is deemed important by the community propagates through the network. Studying the characteristics of content in the messages becomes important for a number of tasks, such as breaking news detection, personalized message recommendation, friends recommendation, sentiment analysis and others. While many researchers wish to use standard text mining tools to understand messages on Twitter, the restricted length of those messages prevents them from being employed to their full potential. We address the problem of using standard topic models in micro-blogging environments by studying how the models can be trained on the dataset. We propose several schemes to train a standard topic model and compare their quality and effectiveness through a set of carefully designed experiments from both qualitative and quantitative perspectives. We show that by training a topic model on aggregated messages we can obtain a higher quality of learned model which results in significantly better performance in two real-world classification problems. We also discuss how the state-of-the-art Author-Topic model fails to model hierarchical relationships between entities in Social Media.

DOI: 10.1145/1964858.1964870

Extracted Key Phrases

12 Figures and Tables

Citations per Year

466 Citations

Semantic Scholar estimates that this publication has 466 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Hong2010EmpiricalSO, title={Empirical study of topic modeling in Twitter}, author={Liangjie Hong and Brian D. Davison}, booktitle={SOMA@KDD}, year={2010} }