Effect of Header-based Features on Accuracy of Classifiers for Spam Email Classification

  title={Effect of Header-based Features on Accuracy of Classifiers for Spam Email Classification},
  author={Priti Kulkarni and Jatinderkumar R. and Haridas Acharya},
  journal={International Journal of Advanced Computer Science and Applications},
Emails are an integral part of communication in today’s world. But Spam emails are a hindrance, leading to reduction in efficiency, security threats and wastage of bandwidth. Hence, they need to be filtered at the first filtering station, so that employees are spared the drudgery of handling them. Most of the earlier approaches are mainly focused on building content-based filters using body of an email message. Use of selected header features to filter spam, is a better strategy, which was… 
3 Citations

Figures and Tables from this paper

Anomaly Detection in Emails using Machine Learning and Header Information
Experimental analysis results obtained demonstrate that email header information only is enough to reliably detect spam and phishing emails, and real-world email filtering applications will benefit from the use of only the header information in terms of resources utilization and efficiency.
Decision Tree Model for Email Classification
  • Ivana Čavor
  • Computer Science
    2021 25th International Conference on Information Technology (IT)
  • 2021
A new approach for feature selection and Iterative Dichotomiser 3 (ID3) algorithm designed to generate the decision tree for email classification is presented and the experimental results indicate that the proposed model achieves very high accuracy.


Identifying spam e-mail based-on statistical header features and sender behavior
A powerful and useful email header features by utilizing the header session messages based on publicly datasets is presented and many machine learning-based classifiers are applied to show the power of the extracted header features in filtering spam and ham messages by evaluating and comparing classifiers performance.
Identifying Potentially Useful Email Header Features for Email Spam Filtering
Experimental studies based on publicly available datasets show that RF classifier has the best performance with an average accuracy, precision, recall, F-Measure, ROC area of 98.5%, respectively.
Spam/ham e-mail classification using machine learning methods based on bag of words technique
The effect of different N-Grams on classification performance and the success of different machine learning techniques in classifying spam e-mail by using accuracy metric are analyzed.
A Novel Technique of Email Classification for Spam Detection
This paper proposes the new approach to classify spam emails using support vector machine and finds that the SVM outperformed than other classifiers.
Highly discriminative statistical features for email classification
An exhaustive comparison of several feature selection and extraction methods in the frame of email classification on different benchmarking corpora shows evidence that especially the technique of biased discriminant analysis offers better discriminative features for the classification, and gives stable classification results notwithstanding the amount of features chosen, and robustly retains their discrim inative value over time and data setups.
An evaluation of statistical spam filtering techniques
Experiments show that classifiers using features from message header alone can achieve comparable or better performance than filters utilizing body features only, which implies that message headers can be reliable and powerfully discriminative feature sources for spam filtering.
A Spam Discrimination Based on Mail Header Feature and SVM
Experimental result indicates that the proposed model can effectively improve the accuracy of spam identification, and uses SVM to sort out mail according to the feature of mail headers.
An ensemble approach applied to classify spam e-mails
A Comparative Study for Email Classification
It is shown that simple J48 classifier which make a binary tree, could be efficient for the dataset which could be classified as binary tree.