Crawling the Hidden Web

@inproceedings{Raghavan2001CrawlingTH,
  title={Crawling the Hidden Web},
  author={Sriram Raghavan and Hector Garcia-Molina},
  booktitle={WWW Posters},
  year={2001}
}
There is a great amount of information on the web that can not be accessed by conventional crawler engines. This portion of the web is usually called hidden web data. To be able to deal with this problem, it is necessary to solve two tasks: crawling the client-side and crawling the server-side hidden web. In this paper we present an architecture and a set of related techniques for accessing the information placed in the client-side hidden web, dealing with aspects such as JavaScript technology… CONTINUE READING
Highly Influential
This paper has highly influenced 74 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 834 citations. REVIEW CITATIONS

Citations

Publications citing this paper.

835 Citations

0204060'01'04'08'12'16
Citations per Year
Semantic Scholar estimates that this publication has 835 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…