Topic classification and clustering on Indonesian complaint tweets for bandung government using supervised and unsupervised learning

Abstract

Seeing the public of Bandung city as an active social media user, Bandung government provides channel in Twitter for citizen to report their complaints. In order to make the citizen complaint monitoring easier, there is a need to automatically detect the topics of complaint tweets (written in Indonesian language) in order to assist the government in managing the complaints reported. In this paper, a system to detect the topics of Indonesian complaint tweets automatically using supervised learning and unsupervised learning approaches is proposed. The supervised learning approach is implemented to classify complaint tweets topic, whereas the unsupervised learning approach is used to cluster complaint tweets based on the similarity of detail information contained in the complaints. Both the supervised learning and the unsupervised learning approaches are required to classify the topics of a tweet and to capture the detail information from each detected topic. The topics are classified using single label and multi label classification. The supervised learning approach is evaluated using accuracy, precision, recall, and F1 score. Three supervised machine learning algorithms are evaluated: Sequential Minimal Optimization, Naïve Bayes Multinomial, and Random Forests. The best algorithm for single label topic classification is SMO, with the accuracy average of 95%, whereas the best algorithm for multi-label topic classification is Random Forests, with 97.92% accuracy, 98.74% precision, 98.36% recall, and 98.44% F1 score. In the unsupervised learning approach, Clustering Index Value is used to evaluate the topic clusters detected. Two unsupervised learning algorithms are evaluated; Exemplar Based Topic Detection and Document Pivot Technique using TF-IDF. Exemplar Based Topic Detection has the best performance for detecting detail topic clusters with Clustering Index Value of 0.9653.

Cite this paper

@article{Pratama2017TopicCA, title={Topic classification and clustering on Indonesian complaint tweets for bandung government using supervised and unsupervised learning}, author={Timothy Pratama and Ayu Purwarianti}, journal={2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA)}, year={2017}, pages={1-6} }