Corpus ID: 37611323

Random forests for the detection of click fraud in online mobile advertising

  title={Random forests for the detection of click fraud in online mobile advertising},
  author={Daniel P. Berrar},
Click fraud is a serious threat to the pay-per-click advertising market. Here, we analyzed the click patterns associated with 3081 publishers of online mobile advertisements. The status of these publishers was known to be either fraudulent, under observation, or honest. The goal was to develop a model to predict the status of a publisher based on its individual click profile. In our study, the best model was a committee of random forests with imbalanced bootstrap sampling. The average precision… Expand

Figures and Tables from this paper

Gradient boosting learning for fraudulent publisher detection in online advertising
PurposeAnalysis of the publisher's behavior plays a vital role in identifying fraudulent publishers in the pay-per-click model of online advertising. However, the vast amount of raw user click dataExpand
A Class Imbalance Learning Approach to Fraud Detection in Online Advertising
By diverting funds away from legitimate partners, click fraud represents a serious drain on advertising budgets and can seriously harm the viability of the internet advertising market. As such, fraudExpand
Learning from automatically labeled data: case study on click fraud prediction
  • D. Berrar
  • Computer Science
  • Knowledge and Information Systems
  • 2015
This work proposes a new approach to generate click profiles for publishers of online advertisements with an average precision of only 36.2 %, and suggests that supervised learning from automatically labeled data should be complemented by an interpretation of conflicting predictions between the new classifier and the ground model. Expand
Multimodal and Contrastive Learning for Click Fraud Detection
A Multimodal and Contrastive learning network for Click Fraud detection (MCCF) that jointly utilizes wide and deep features, behavior sequence and heterogeneous network to distill click representations and is integrated by contrastive learning. Expand


Click Fraud Resistant Methods for Learning Click-Through Rates
It is demonstrated that a particular class of learning algorithms, called click-based algorithms, are resistant to click fraud in some sense, and it is shown that other common learning algorithms are vulnerable to fraudulent attacks. Expand
Using Random Forest to Learn Imbalanced Data
Two ways to deal with the imbalanced data classification problem using random forest are proposed, one is based on cost sensitive learning, and the other isbased on a sampling technique. Expand