Crawling the content hidden behind web forms

  title={Crawling the content hidden behind web forms},
  author={Manuel {\'A}lvarez and Juan Raposo and Alberto Pan and Fidel Cacheda and Fernando Bellas and Victor Carneiro},
The crawler engines of today cannot reach most of the information contained in the Web. A great amount of valuable information is “hidden” behind the query forms of online databases, and/or is dynamically generated by technologies such as JavaScript. This portion of the web is usually known as the Deep Web or the Hidden Web. We have built DeepBot, a prototype hiddenweb crawler able to access such content. DeepBot receives as input a set of domain definitions, each one describing a specific data… CONTINUE READING

From This Paper

Figures, tables, and topics from this paper.


Publications referenced by this paper.
Showing 1-10 of 14 references

Similar Papers

Loading similar papers…