A hybrid approach for spam filtering using local concentration based K-Means clustering

Abstract

Electronic mail (email) has become an essential element for Internet users. Many studies indicate that day by day numbers of internet users are increasing. As population increasing on the Internet, volume of email traffic is also growing. This entire volume of email consist 80% of unwanted emails. These unwanted emails are known as spam email and referred as unsolicited bulk email (UBE). These emails are sent in bulk to large number of recipients. This increased volume of spam email results a most common problem i.e. maintaining email inbox. Spam Email is major issue for internet community because it causes wastage of resources and also pollutes our environment. To prevent these adverse effects of spam email, spam filtering is essential task. Various researchers have proposed many techniques and algorithms for spam filtering; which focuses on individual parameters of the malicious content. In current scenario spammers are also become intelligent they attack on weak point of filtering system. In this work we divided entire process of filtering in four stages. At first stage we applied string tokenizer for generating terms from incoming message. These tokens are passed to second stage where we applied Information Gain (IG) as term selection strategy. After this we passed selected terms to third stage of filtering. Third stage consist of Local Concentration based Artificial Immune System for feature selection. Newly constructed feature vectors are passed to K-Means clustering algorithm for classification at fourth stage. In support of our work we conducted several experiments and gave a comparative analysis with various existing methods on different parameters.

7 Figures and Tables

Cite this paper

@article{Jain2014AHA, title={A hybrid approach for spam filtering using local concentration based K-Means clustering}, author={Kunal Jain and Sanjay Agrawal}, journal={2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence)}, year={2014}, pages={194-199} }