Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

Abstract

Image spam is a new trend in the family of email spams. The new image spams employ a variety of image processing technologies to create random noises. In this paper, we propose a semi-supervised approach, regularized discriminant EM algorithm (RDEM), to detect image spam emails, which leverages small amount of labeled data and large amount of unlabeled data for identifying spams and training a classification model simultaneously. Compared with fully supervised learning algorithms, the semi-supervised learning algorithm is more suitedin adversary classification problems, because the spammers are actively protecting their work by constantly making changes to circumvent the spam detection. It makes the cost too high for fully supervised learning to frequently collect sufficient labeled data for training. Experimental results demonstrate that our approach achieves 91.66% high detection rate with less than 2.96% false positive rate, meanwhile it significantly reduces the labeling cost.

DOI: 10.1007/978-3-642-03348-3_17

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Gao2009SemiSI, title={Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach}, author={Yan Gao and Ming Yang and Alok N. Choudhary}, booktitle={ADMA}, year={2009} }