Web spam identification through content and hyperlinks


We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it <i>simultaneously</i> exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and provides state-of-the-art accuracy on a standard Web spam benchmark. 
DOI: 10.1145/1451983.1451994


@inproceedings{Abernethy2008WebSI, title={Web spam identification through content and hyperlinks}, author={Jacob D. Abernethy and Olivier Chapelle and Carlos Castillo}, booktitle={AIRWeb}, year={2008} }