Web crawler

Known as: Webcrawler, Crawl site, RBSE 
A Web crawler is an Internet bot which systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering). Web… (More)
Wikipedia

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
2009
2009
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Ironically the very size of this… (More)
Is this relevant?
Highly Cited
2008
Highly Cited
2008
Searching for Web service access points is no longer attached to service registries as Web search engines have become a new major… (More)
  • figure 1
  • table 2
  • table 1
  • table 3
  • table 4
Is this relevant?
Highly Cited
2007
Highly Cited
2007
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays… (More)
  • figure 1
  • figure 2
Is this relevant?
Highly Cited
2003
Highly Cited
2003
We report our experience in implementing UbiCrawler, a scalable distributed Web crawler, using the Java programming language. The… (More)
  • figure 1
  • figure 2
  • figure 3
Is this relevant?
Highly Cited
2002
Highly Cited
2002
Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages… (More)
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • table 4.1
Is this relevant?
Highly Cited
2001
Highly Cited
2001
The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct… (More)
  • figure 1
  • table 1
Is this relevant?
Highly Cited
2001
Highly Cited
2001
This pal,,'r out lines t hrd,,~igll of a wr-I. rr awlor implr-monn-d for 113:\1 ;\hll;"I"II'~ \\','hFolilltaill proj"d alld d… (More)
  • table 1
  • table 2
  • figure 1
  • figure 2
  • figure 3
Is this relevant?
Highly Cited
2000
Highly Cited
2000
In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index… (More)
  • table 1
  • figure 1
  • figure 2
  • figure 3
  • figure 4
Is this relevant?
Highly Cited
1999
Highly Cited
1999
This paper describes Mercator, a scalable, extensible Web crawler written entirely in Java. Scalable Web crawlers are an… (More)
  • figure 1
  • table 2
  • table 1
  • figure 2
  • table 3
Is this relevant?
Highly Cited
1998
Highly Cited
1998
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in… (More)
Is this relevant?