Distributed High-performance Web Crawlers : A Survey of the State of the Art

@inproceedings{Boswell2003DistributedHW,
  title={Distributed High-performance Web Crawlers : A Survey of the State of the Art},
  author={Dustin Boswell},
  year={2003}
}
Web Crawlers (also called Web Spiders or Robots), are programs used to download documents from the internet. Simple crawlers can be used by individuals to copy an entire web site to their hard drive for local viewing. For such small-scale tasks, numerous utilities like wget exist. In fact, an entire web crawler can be written in 20 lines of Python code. Indeed, the task is inherently simple: the general algorithm is shown in figure 1. However, if one needs a large portion of the web (eg. Google… CONTINUE READING

From This Paper

Topics from this paper.
8 Citations
11 References
Similar Papers

Citations

Publications citing this paper.
Showing 1-8 of 8 extracted citations

Similar Papers

Loading similar papers…