Email Spam Filtering: A Systematic Review

@article{Cormack2006EmailSF,
  title={Email Spam Filtering: A Systematic Review},
  author={Gordon V. Cormack},
  journal={Found. Trends Inf. Retr.},
  year={2006},
  volume={1},
  pages={335-455}
}
  • G. Cormack
  • Published 23 June 2008
  • Computer Science
  • Found. Trends Inf. Retr.
Spam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with their appeal to the intent of sender and recipient make them difficult to formalize. A typical email… 
Enhancing the Naive Bayes Spam Filter Through Intelligent Text Modification Detection
  • W. Peng, Linda Huang, Julia Jia, Emma E. Ingram
  • Computer Science
    2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)
  • 2018
TLDR
A novel algorithm for enhancing the accuracy of the Naive Bayes Spam Filter is implemented, indicating that Bayesian Poisoning, a controversial topic, is actually a real phenomenon and utilized by spammers.
A REVIEW PAPER ON IMAGE SPAM FILTERING
TLDR
The main purpose to write this paper is to take review of what is image spam, working of spam filters, methods to identify spam along with Spam filtering techniques for an image based spam.
Occam’s razor-based spam filter
TLDR
A novel approach to spam filtering based on the minimum description length principle is presented and the results indicate that the proposed filter is fast to construct, incrementally updateable and clearly outperforms the state-of-the-art spam filters.
Competing with Spams More Fiercely: An Empirical Study on the Effectiveness of Anti-Spam Legislation
Spam mail still accounts for more than 80% of the total email traffic. More recently, it has played a role as a potential propagator of vicious attacks such as viruses, phishing, and malware.
TREC 2006 Spam Track Overview
TLDR
TREC’s Spam Track uses a standard testing framework that presents a set of chronologically ordered email messages a spam filter for classification and four different forms of user feedback are modeled, intended to model a user reading email from time to time and perhaps not diligently reporting the filter's errors.
Comparision of String Matching Algorithms on Spam Email Detection
  • C. Varol, H. Abdulhadi
  • Computer Science
    2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT)
  • 2018
TLDR
This work examines and compares the efficiency of six well-known string matching algorithms, namely Longest Common Subsequence (LCS), Levenshtein Distance (LD), Jaro, Jaro-Winkler, Bi-gram, and TFIDF on two various datasets which are Enron corpus and CSDMC2010 spam dataset and observed that Bi- gram algorithm performs best in spam detection in both datasets.
Personal Email Spam Filtering with Minimal User Interaction
TLDR
This work describes new approaches to solve the problem of building a personal spam filter that requires minimal user feedback, and shows that learning filters with no user input can substantially improve the results of open-source and industry-leading commercial filters that employ no user-specific training.
Probabilistic anti-spam filtering with dimensionality reduction
TLDR
This paper compares the performance of most popular methods used as term selection techniques with some variations of the original naive Bayes anti-spam filter.
Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers
TLDR
This paper studies the performance of many term-selection techniques with several different models of Naive Bayes spam filters, and investigates the benefits of using the Matthews correlation coefficient as a measure of performance.
...
...

References

SHOWING 1-10 OF 179 REFERENCES
Content based SMS spam filtering
TLDR
This paper analyzes to what extent Bayesian filtering techniques used to block email spam, can be applied to the problem of detecting and stopping mobile spam, and demonstrates that Bayesian filters can be effectively transferred from email to SMS spam.
TREC 2006 Spam Track Overview
TLDR
TREC’s Spam Track uses a standard testing framework that presents a set of chronologically ordered email messages a spam filter for classification and four different forms of user feedback are modeled, intended to model a user reading email from time to time and perhaps not diligently reporting the filter's errors.
On Attacking Statistical Spam Filters
TLDR
This work examines the general attack methods spammers use, along with challenges faced by developers and spammers, and demonstrates an attack that, while easy to implement, attempts to more strongly work against the statistical nature behind filters.
Filtering Email Spam in the Presence of Noisy User Feedback
TLDR
It is shown that noisy feedback may harm or even break state-of-the-art spam filters, including recent TREC winners, and several approaches are proposed and evaluated to make such filters robust to label noise.
Relaxed online SVMs for spam filtering
TLDR
It is shown that online SVMs indeed give state-of-the-art classification performance on online spam filtering on large benchmark data sets, and that nearly equivalent performance may be achieved by a Relaxed Online SVM (ROSVM) at greatly reduced computational cost.
Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach
TLDR
This work investigates thoroughly the performance of the Naive Bayesian filter on a publicly available corpus, contributing towards standard benchmarks and compares it to an alternative memorybased learning approach, after introducing suitable cost-sensitive evaluation measures.
Using latent semantic indexing to filter spam
TLDR
This study evaluates the effectiveness of a classifier incorporating Latent Semantic Indexing to filter spam email on corpus used in previous studies and shows that incorporating LSI into an anti-spam filter is viable, particularly in implementations when misclassified legitimate messages are not arbitrarily deleted.
Is Britney Spears Spam?
TLDR
The aim is to redefine spam and the role of the spam filter in the context of Social Networking Services (SNS) and develop a research prototype that categorizes senders into broader categories than spam/not spam using features unique to SNS.
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
TLDR
An extensive empirical evaluation of memory-based learning in the context of anti-spam filtering, a novel cost-sensitive application of text categorization that attempts to identify automatically unsolicited commercial messages that flood mailboxes, concludes that memory- based anti- Spam filtering for mailing lists is practically feasible, especially when combined with additional safety nets.
Spam Filtering with Naive Bayes - Which Naive Bayes?
TLDR
An experimental procedure that emulates the incremental training of person- alized spam filters is adopted, and roc curves that allow us to compare the dierent versions of nb over the entire tradeo between true positives and true negatives are plotted.
...
...