A fast and robust method for web page template detection and removal

@inproceedings{Vieira2006AFA,
  title={A fast and robust method for web page template detection and removal},
  author={Karane Vieira and Altigran Soares da Silva and Nick Pinto and Edleno Silva de Moura and Jo{\~a}o M. B. Cavalcanti and Juliana Freire},
  booktitle={CIKM},
  year={2006}
}
The widespread use of templates on the Web is considered harmful for two main reasons. Not only do they compromise the relevance judgment of many web IR and web mining methods such as clustering and classification, but they also negatively impact the performance and resource usage of tools that process web pages. In this paper we present a new method that efficiently and accurately removes templates found in collections of web pages. Our method works in two steps. First, the costly process of… CONTINUE READING
Highly Cited
This paper has 83 citations. REVIEW CITATIONS

12 Figures & Tables

Topics

Statistics

051015'07'08'09'10'11'12'13'14'15'16'17'18
Citations per Year

84 Citations

Semantic Scholar estimates that this publication has 84 citations based on the available data.

See our FAQ for additional information.