IRLbot: scaling to 6 billion pages and beyond


This article shares our experience in designing a Web crawler that can download billions of pages using a single-server implementation and models its performance. We first show that current crawling algorithms cannot effectively cope with the sheer volume of URLs generated in large crawls, highly branching spam, legitimate multimillion-page blog sites, and… (More)
DOI: 10.1145/1541822.1541823
View Slides


10 Figures and Tables


Citations per Year

122 Citations

Semantic Scholar estimates that this publication has 122 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Lee2008IRLbotST, title={IRLbot: scaling to 6 billion pages and beyond}, author={Hsin-Tsang Lee and Derek Leonard and Xiaoming Wang and Dmitri Loguinov}, journal={TWEB}, year={2008}, volume={3}, pages={8:1-8:34} }