"In vivo" spam filtering: A challenge problem for data mining
@article{Fawcett2004InVS, title={"In vivo" spam filtering: A challenge problem for data mining}, author={Tom Fawcett}, journal={ArXiv}, year={2004}, volume={cs.AI/0405007} }
Spam, also known as Unsolicited Commercial Email (UCE), is the bane of email communication. Many data mining researchers have addressed the problem of detecting spam, generally by treating it as a static text classification problem. True in vivo spam filtering has characteristics that make it a rich and challenging domain for data mining. Indeed, real-world datasets with these characteristics are typically difficult to acquire and to share. This paper demonstrates some of these characteristics…
61 Citations
Spam Mail Filtering Through Data Mining Approach -A Comparative Performance Analysis
- Computer Science
- 2013
In this paper, spam dataset is analyzed using CLEMENTINE data mining tool for email spam classification and best classifier forEmail spam is identified based on the Training and testing accuracy of various models and Performance measures.
A survey of learning-based techniques of email spam filtering
- Computer ScienceArtificial Intelligence Review
- 2009
An overview of the state of the art of machine learning applications for spam filtering, and of the ways of evaluation and comparison of different filtering methods.
R-SALSA: A spam filtering technique for social networking sites
- Computer Science2016 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS)
- 2016
An unsupervised approach namely Reliability based Stochastic Approach for Link-Structure Analysis (R-SALSA) algorithm has been proposed in this paper for classifying a message being Spam or benign and it is found to be performing better than previously proposed unsuper supervised author-reporter model.
Application of Learning Algorithms to Image Spam Evolution
- Computer Science
- 2013
This chapter identifies eight features for the detection of computer generated image spam versus ham (non-spam) and uses J48 and J48 with reduced error pruning decision trees to classify the images.
A Fuzzy approach for Spam Mail Detection integrated with wordnet hypernyms key term extraction
- Computer Science
- 2012
This paper proposes an efficient yet simple fuzzy based simple method applied on refined key terms set extracted from the email using wordnet and hypernyms concept to filter spam mail.
The Impact of Feature Selection on Signature-Driven Spam Detection
- Computer ScienceCEAS
- 2004
This work proposes a technique for increasing signature robustness, targeting the I-Match algorithm, but applicable to other single-signature detection schemes, and shows that distributional word clustering is demonstrated to be effective in increasing signatures robustness.
Survey on Internet Spam : Classification and Analysis
- Computer Science
- 2013
The impact of various spams in social networks, email, image, content and links is discussed, the technique applied to prevent the spam in various areas is listed and the things to be considered to construct the spam algorithms are listed.
Online supervised spam filter evaluation
- Computer ScienceTOIS
- 2007
Eleven variants of six widely used open-source spam filters are tested on a chronological sequence of 49086 e-mail messages received by an individual from August 2003 through March 2004, indicating that content-based filters can eliminate 98% of spam while incurring 0.1% legitimate email loss.
Machine Learning for E-mail Spam Filtering: Review, Techniques and Trends
- Computer ScienceArXiv
- 2016
A comprehensive review of the most effective content-based e-mail spam filtering techniques, focusing primarily on Machine Learning-based spam filters and their variants, and exploring the promising offshoots of latest developments.
Trusting spam reporters: A reporter-based reputation system for email filtering
- Computer ScienceTOIS
- 2008
This work proposes a reactive spam- Filtering system based on reporter reputation for use in conjunction with existing spam-filtering techniques, and reports on the utility of a reputation system for spam filtering that makes use of the feedback of trustworthy users.
References
SHOWING 1-10 OF 53 REFERENCES
A Case-Based Approach to Spam Filtering that Can Track Concept Drift
- Computer Science
- 2003
A case-based approach to spam filtering allows for the sharing of cases and thus a sharing of the effort of labeling email as spam.
An evaluation of Naive Bayesian anti-spam filtering
- Computer ScienceArXiv
- 2000
It is reached that additional safety nets are needed for the Naive Bayesian anti-spam filter to be viable in practice.
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages
- Computer ScienceSIGIR '00
- 2000
This work introduces appropriate cost-sensitive measures, and investigates at the same time the effect of attribute-set size, training-corpus size, lemmatization, and stop lists, issues that have not been explored in previous experiments.
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
- Computer ScienceInformation Retrieval
- 2004
An extensive empirical evaluation of memory-based learning in the context of anti-spam filtering, a novel cost-sensitive application of text categorization that attempts to identify automatically unsolicited commercial messages that flood mailboxes, concludes that memory- based anti- Spam filtering for mailing lists is practically feasible, especially when combined with additional safety nets.
SVM-based Filtering of E-mail Spam with Content-specic Misclassication Costs
- Computer Science
- 2001
We address the problem of separating legitimate emails from uncolicited ones in the context of a large-scale operation, where the diversity of user accounts is very high, while misclassiÞcation costs…
Evaluating cost-sensitive Unsolicited Bulk Email categorization
- Computer ScienceSAC '02
- 2002
This paper discusses cost-sensitive Text Categorization methods for UBE filtering, and uses the Receiver Operating Characteristic Convex Hull method for the evaluation, that best suits classification problems in which target conditions are not known.
A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering
- Computer Science, MathematicsEACL
- 2003
It is found that the multinomial model is less biased towards one class and achieves slightly higher accuracy than the multi-variate Bernoulli model.
A Bayesian Approach to Filtering Junk E-Mail
- Computer ScienceAAAI 1998
- 1998
This work examines methods for the automated construction of filters to eliminate such unwanted messages from a user’s mail stream, and shows the efficacy of such filters in a real world usage scenario, arguing that this technology is mature enough for deployment.
Na ive-Bayes vs. Rule-Learning in Classification of Email
- Computer Science
- 1999
Three experiments in automatic mail foldering and spam filtering are presented, showing that bayesian learning with bag-valued features and the RIPPER rule-learning algorithm outperforms RIPPER in classification accuracy.
Fraud detection
- Computer Science
- 2002
This paper discusses general characteristics of fraud detection problems that make them difficult, as well as system integration issues for automatic fraud detection systems.