Efficient Modeling of User-Entity Preference in Big Social Networks


Data generated by social media are frequently leveraged to build machine learning models that can accurately profile human behavior and sentiment. Twitter is a readily available source of population data that can be collected and used by any organization. Therefore, accurate machine learning models must be created to learn from this user-generated content. In this paper, we explore the task of classifying a user's preference towards a specific entity. Particularly, we study the accuracy of classification models as an increasing number of tweets (status posts) per user is provided to the models. New users and tweets are constantly being created, warranting the use of techniques to reduce the size of data needed for machine learning algorithms. We find that there is a diminishing return on model performance as the number of tweets per user is increased, and identify a threshold where adding more tweets per user does not result in statistically better performance. Utilizing this threshold, as opposed to the maximum amount of tweets per user, data collection time is reduced by 80% while dataset size is reduced by 75%.

DOI: 10.1109/ICTAI.2015.141

15 Figures and Tables

Cite this paper

@article{Richter2015EfficientMO, title={Efficient Modeling of User-Entity Preference in Big Social Networks}, author={Aaron N. Richter and Michael Crawford and Brian Heredia and Taghi M. Khoshgoftaar}, journal={2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI)}, year={2015}, pages={982-988} }