Battling the Internet water army: Detection of hidden paid posters

  title={Battling the Internet water army: Detection of hidden paid posters},
  author={Cheng Chen and Kui Wu and Venkatesh Srinivasan and Xudong Zhang},
  journal={2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)},
  • Cheng ChenKui Wu Xudong Zhang
  • Published 18 November 2011
  • Computer Science
  • 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)
We initiate a systematic study to help distinguish a special group of online users, called hidden paid posters, or termed “Internet water army” in China, from the legitimate ones. On the Internet, the paid posters represent a new type of online job opportunities. They get paid for posting comments or articles on different online communities and Websites for hidden purposes, e.g., to influence the opinion of other people towards certain social events or business markets. While being an… 

Figures from this paper

Detecting Internet Hidden Paid Posters Based on Group and Individual Characteristics

This paper constructs a classifier based on both the individual and group characteristics to detect paid posters and finds that group characteristics are also very important in detecting them comparing to individual characteristics.

Uncovering and Characterizing Internet Water Army in Online Forums

A novel divide-and-conquer online forum Internet waterArmy detection algorithm according to the fact that Internet water army always appear in groups, echo each other and work in collusion and the accuracy of the algorithm is high.

SNSaPP: Unbiased Social Media Analysis Against Paid Posters

SNSAPP aims to provide an unbiased ranking and data analysis, when a burst event happens and there are lots of paid posters involved in, and provides the functionality for users to monitoring true rankings and the evolving process of events, which helps people to make the right decisions.

Whisper campaigns: market risks through online rumours on the Chinese Internet

The rapid growth of the Chinese Internet has brought about the emergence of many new channels of communication between businesses (B2B), between businesses and consumers (B2C and C2B) and between

Detecting the Internet Water Army via comprehensive behavioral features using large-scale E-commerce reviews

This work designs a comprehensive set of features to compare paid posters against normal users on different dimensions and builds an ensemble detection model of seven different algorithms which outperforms previous studies.

The Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans

This work provides a typology of the Web’s false information ecosystem, comprising various types of false information, actors, and their motives, and pays particular attention to political false information as it can have dire consequences to the community.

Retrieve the Hidden Leaves in the Forest: Prevent Voting Spamming in Zhihu

This work takes Zhihu - one popular Chinese \( Q \& A\) website as a case study, and proposes a time diversity based voting scheme to reduce the impact of voting spamming and illustrates that, the proposed opinion tolerant system can maintain a good balance in the appearance of different opinions.

An Effective Identification Technology for Online News Comment Spammers in Internet Media

The results show that the technology proposed in this paper involves a lower data cost but a better identification effect than some traditional technologies based on the supervised classifier.

The Web of False Information

A typology of the Web’s false-information ecosystem, composed of various types of false- information, actors, and their motives is provided, which pays particular attention to political false information as it can have dire consequences to the community and previous work shows that this type of false information propagates faster and further when compared to other types offalse information.

Finding the hidden hands: a case study of detecting organized posters and promoters in SINA weibo

Extensive experimental results demonstrate that the method based on individual and group characteristics using SVM model (IGCSVM) is effective in detecting organized posters and better than existing methods.



Spotting fake reviewer groups in consumer reviews

This paper studies spam detection in the collaborative setting, i.e., to discover fake reviewer groups by using several behavioral models derived from the collusion phenomenon among fake reviewers and relation models based on the relationships among groups, individual reviewers, and products they reviewed to detectfake reviewer groups.

Identifying video spammers in online social networks

This paper builds a large test collection of YouTube users, and applies machine learning to provide a heuristic for classifying an arbitrary video as either legitimate or spam, and shows that this approach succeeds at detecting much of the spam while only falsely classifying a small percentage of the legitimate videos as spam.

Serf and turf: crowdturfing for fun and profit

Through measurements, a significant effort to study and understand "crowdturfing" systems in today's Internet is described, finding surprising evidence showing that not only do malicious crowd-sourcing systems exist, but they are rapidly growing in both user base and total revenue.

Detection of Harassment on Web 2.0

This paper uses a supervised learning approach for harassment that employs content features, sentiment features, and contextual features of documents and achieves significant improvements over several baselines, including Term Frequency- Inverse Document Frequency (TFIDF) approaches.

Detecting Comment Spam through Content Analysis

This paper tries to automatically detect comment spam through content analysis, using some previously-undescribed features and shows that the combined heuristics can correctly identify comment spam with high precision and recall.

A Quantitative Study of Forum Spamming Using Context-based Analysis

This study examines spam blogs and spam comments in both legitimate and honey forums, and proposes contextbased analyses, consisting of redirection and cloaking analysis, to detect spam automatically and to overcome shortcomings of content-based analyses.

What is Twitter, a social network or a news media?

This work is the first quantitative study on the entire Twittersphere and information diffusion on it and finds a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.

Quantifying the trustworthiness of social media content

A two-step unsupervised, feature-driven approach is proposed for quantifying the value of shared health content with respect to its trustworthiness and results indicate that this approach is effective and can be adapted to disparate social media applications with ease.

Detecting and characterizing social spam campaigns

An initial study to detect and quantitatively analyze the coordinated spam campaigns on online social networks in the wild finds that more than 70% of all malicious wall posts are advertising phishing sites.

Prevalence and mitigation of forum spamming

To mitigate the problem of forum spam, light-weight features based on spammers' IP, commenting activity and the anatomy of their posts are developed and it is found that an SVM classifier trained on these features can achieve a 99.81% precision and 92.82% recall in identifying forum spam.