Crawling the Infinite Web: Five Levels Are Enough

@inproceedings{BaezaYates2004CrawlingTI,
  title={Crawling the Infinite Web: Five Levels Are Enough},
  author={Ricardo A. Baeza-Yates and Carlos Castillo},
  booktitle={WAW},
  year={2004}
}
A large amount of publicly available Web pages are generated dynamically upon request, and contain links to other dynamically generated pages. This usually produces Web sites which can create arbitrarily many pages. This poses a problem to search engine managers: they need a rule to configure the search engine’s crawler in such a way that it stops downloading pages from each Web site at some depth. But how deep must the crawler go? In this article, several probabilistic models for browsing… CONTINUE READING
Highly Cited
This paper has 52 citations. REVIEW CITATIONS

12 Figures & Tables

Topics

Statistics

0510'05'07'09'11'13'15'17
Citations per Year

53 Citations

Semantic Scholar estimates that this publication has 53 citations based on the available data.

See our FAQ for additional information.