A Machine Learning Based Web Spam Filtering Approach

Abstract

Web spam has the effect of polluting search engine results and decreasing the usefulness of search engines.Web spam can be classified according to the methods used to raise the web page's ranking by subverting web search engine's algorithms used to rank search results. The main types are: content spam, link spam and cloaking spam. There has been little or no work on automatically classifying web spam by type. This paper has two contributions, (i) we propose a Dual-Margin Multi-Class Hypersphere Support Vector Machine (DMMH- SVM) classifier approach to automatically classifying web spam by type, (ii) we introduce novel cloaking-based spam features which help our classifier model to achieve high precision and recall rate, thereby reducing the false positive rates. The effectiveness of the proposed model is justified analytically. Our experimental results demonstrated that DMMH-SVM outperforms existing algorithms with novel cloaking features.

DOI: 10.1109/AINA.2016.177

6 Figures and Tables

Cite this paper

@article{Kumar2016AML, title={A Machine Learning Based Web Spam Filtering Approach}, author={Santosh Kumar and Xiaoying Gao and Ian Welch and Masood Mansoori}, journal={2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)}, year={2016}, pages={973-980} }