TrollSpot: Detecting misbehavior in commenting platforms

@article{Li2017TrollSpotDM,
  title={TrollSpot: Detecting misbehavior in commenting platforms},
  author={Tai-Ching Li and Joobin Gharibshah and Evangelos E. Papalexakis and Michalis Faloutsos},
  journal={Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017},
  year={2017}
}
  • Tai-Ching Li, Joobin Gharibshah, M. Faloutsos
  • Published 31 July 2017
  • Computer Science
  • Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017
Commenting platforms, such as Disqus, have emerged as a major online communication platform with millions of users and posts. [] Key Method Our work provides two key novelties: (a) we provide a fine-grained classification of malicious behaviors, and (b) we use a comprehensive set of 73 features that span four dimensions of information.

Figures and Tables from this paper

LinkMan: hyperlink-driven misbehavior detection in online security forums
TLDR
This work presents LinkMan, a systematic suite of capabilities, to detect and analyze hyperlink-driven misbehavior in online forums, and shows that the approach works very well in terms of retrieving and classifying hyperlinks compared to previous solutions.
RThread: A thread-centric analysis of security forums
TLDR
This work proposes, RThread, a comprehensive unsupervised clustering approach with a powerful visualization component, which is provided as a publicly-accessible web-based tool and shows how the approach can spot surprising behaviors, including a cluster, whose threads are used for Search Engine Optimization.
InferIP: Extracting actionable information from security discussion forums
TLDR
A method to automate the identification of malicious IPs with the design goal of being independent of external sources is developed, which exhibits high classification accuracy, while the precision of identifying malicious IP in post is greater than 88% in all three sites.
Mining actionable information from security forums: the case of malicious IP addresses
TLDR
A method to automate the identification of malicious IP addresses with the design goal of being independent of external sources is developed, which exhibits high classification accuracy, while the precision of identifying malicious IP in post is greater than 88% in all three forums.
Aggressive language in an online hacking forum
TLDR
It is observed that the purpose of conversations in online forums tend to be more constructive and informative than those in Wikipedia page edit comments which are geared more towards adversarial interactions, and that this may explain the lower levels of abuse found in the forum data than in Wikipedia comments.
From Royals to Vegans: Characterizing Question Trolling on a Community Question Answering Website
TLDR
This paper identifies a set of over 400,000 troll questions on Yahoo Answers aimed to inflame, upset, and draw attention from others on the community, and reveals unique characteristics of troll questions when compared to "regular" questions, with regards to their metadata, text, and askers.
From Security to Community Detection in Social Networking Platforms
TLDR
Graph-based techniques for data analysis such as graph clustering and edge sampling are presented while prediction methods for structured and unstructured data are applied to a variety of fields such as financial systems, security forums, and social networks.
Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination
TLDR
Analysis of issues related to hate, violence and discriminatory bias in a dataset containing more than 7,000 videos and 17 million comments shows that right-wing channels tend to contain a higher degree of words from "negative'' semantic fields.
CAMsterdam at SemEval-2019 Task 6: Neural and graph-based feature extraction for the identification of offensive tweets
TLDR
The proposed model learns to extract textual features using a multi-layer recurrent network, and then performs text classification using gradient-boosted decision trees (GBDT), and a self-attention architecture enables the model to focus on the most relevant areas in the text.
...
...

References

SHOWING 1-10 OF 23 REFERENCES
A new approach to bot detection: Striking the balance between precision and recall
TLDR
A model which increases the recall in detecting bots, allowing a researcher to delete more bots is proposed, and it is shown that the detection algorithm removes more bots from a dataset than current approaches.
Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?
TLDR
This paper conducts a set of large-scale measurements with a collection of over 500,000 accounts and proposes a classification system that uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot, or cyborg on Twitter.
Mining User Comment Activity for Detecting Forum Spammers in YouTube
TLDR
A method to automatically detect comment spammer in YouTube (largest and a popular video sharing website) forums is presented, based on mining comment activity log of a user and extracting patterns indicating spam behavior.
InferIP: Extracting actionable information from security discussion forums
TLDR
A method to automate the identification of malicious IPs with the design goal of being independent of external sources is developed, which exhibits high classification accuracy, while the precision of identifying malicious IP in post is greater than 88% in all three sites.
Antisocial Behavior in Online Discussion Communities
TLDR
This paper characterize antisocial behavior in three large online discussion communities by analyzing users who were banned from these communities, finding that such users tend to concentrate their efforts in a small number of threads, are more likely to post irrelevantly, and are more successful at garnering responses from other users.
If walls could talk: Patterns and anomalies in Facebook wallposts
TLDR
This work model Facebook user behavior: it analyzes the wall activities of users focusing on identifying common patterns and surprising phenomena, and proposes PowerWall, a lesser known heavy-tailed distribution to fit the data.
RSC: Mining and Modeling Temporal Activity in Social Media
TLDR
This paper analyzes time-stamp data from social media services and finds that the distribution of postings inter-arrival times (IAT) is characterized by four patterns: positive correlation between consecutive IATs, heavy tails, periodic spikes and bimodal distribution.
Anyone Can Become a Troll: Causes of Trolling Behavior in Online Discussions
TLDR
A predictive model of trolling behavior reveals that mood and discussion context together can explain trolling behavior better than an individual's history of trolling, and suggests that ordinary people can, under the right circumstances, behave like trolls.
Prediction of cyberbullying incidents in a media-based social network
TLDR
This paper investigates the prediction of cyberbullying incidents in Instagram, a popular media-based social network, and extracts several important features from the initial posting data for automated cyberbullies prediction, including profanity and linguistic content of the text caption, image content, as well as social graph parameters and temporal content behavior.
Relaxed online SVMs for spam filtering
TLDR
It is shown that online SVMs indeed give state-of-the-art classification performance on online spam filtering on large benchmark data sets, and that nearly equivalent performance may be achieved by a Relaxed Online SVM (ROSVM) at greatly reduced computational cost.
...
...