• Corpus ID: 55018612

Title Classification of Malicious Web Pages through a J 48 Decision Tree , aNaïve Bayes , a RBF Network and a Random Forest Classifier forWebSpam Detection Permalink

  title={Title Classification of Malicious Web Pages through a J 48 Decision Tree , aNa{\"i}ve Bayes , a RBF Network and a Random Forest Classifier forWebSpam Detection Permalink},
  author={Muhammad Iqbal and Malik Muneeb Abid and Usman Waheed and Syed Hasnain Alam Kazmi},
Web spam is a negative practice carried out by spammers to produce fake search engines results for improving rank position of their Web pages. It is available on arena of World Wide Web (WWW) in different forms and lacks a consistent definition. The search engines are struggling to eliminate spam pages through machine learning (ML) detectors. Mostly, search engines measure the quality of websites by using different factors (signals) such as, number of visitors, body text, anchor text, back link… 
Towards Evaluating Web Spam Threats and Countermeasures
The results indicate that online real time tools are highly recommended solutions against web spam threats.


A Survey on Web Spam Detection Methods: Taxonomy
This paper classifies web spam techniques and the related detection methods and shows that some of these techniques are working well and can find spam pages more accurate than the others.
Survey on web spam detection: principles and algorithms
This paper presents a systematic review of web spam detection techniques with the focus on algorithms and underlying principles, and categorizes all existing algorithms into three categories based on the type of information they use: content- based methods, link-based methods, and methods based on non-traditional data.
Know your neighbors: web spam detection using the web topology
A spam detection system that combines link-based and content-based features, and uses the topology of the Web graph by exploiting the link dependencies among the Web pages, which finds that linked hosts tend to belong to the same class.
SAAD, a content based Web Spam Analyzer and Detector
An analysis of different kinds of Web Spam pages and identifies new elements that characterise it, to define heuristics which are able to partially detect them and proposed SAAD (Spam Analyzer And Detector), which is based on the set of proposed heuristic and their use in a C4.5 classifier improved by means of Bagging and Boosting techniques.
Combating Web Spam with TrustRank
This paper proposes techniques to semi-automatically separate reputable, good pages from spam, and shows that they can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.
Looking into the past to better classify web spam
Content features from historical versions of web pages are used to improve spam classification and show that this approach improves spam classification F-measure performance by 30% compared to a baseline classifier which only considers current page content.
Detecting spam web pages through content analysis
Some previously-undescribed techniques for automatically detecting spam pages are considered, and the effectiveness of these techniques in isolation and when aggregated using classification algorithms is examined.
Web Spam Detection : link-based and content-based techniques
The Web is both an excellent medium for sharing information as well as an attractive platform for delivering products and services. This platform is, to some extent, mediated by search engines in
Identifying link farm spam pages
Algorithms for detecting link farms automatically are presented by first generating a seed set based on the common link set between incoming and outgoing links of Web pages and then expanding it, providing a modified web graph to use in ranking page importance.
Detection of review spam: A survey
The present research focuses on systematically analyzing and categorizing models that detect review spam, and finds that studies can be categorized into three groups that focus on methods to detect spam reviews, individual spammers and group spam.