Crawling Deep Web Using a New Set Covering Algorithm

  title={Crawling Deep Web Using a New Set Covering Algorithm},
  author={Yan Wang and Jianguo Lu and Jessica Szu-Chia Chen},
Crawling the deep web often requires the selection of an appropriate set of queries so that they can cover most of the documents in the data source with low cost. This can be modeled as a set covering problem which has been extensively studied. The conventional set covering algorithms, however, do not work well when applied to deep web crawling due to various special features of this application domain. Typically, most set covering algorithms assume the uniform distribution of the elements… CONTINUE READING


Publications referenced by this paper.

An approach to deep web crawling by sampling

  • J.Lu, Y.Wang, J.liang, J.Chen, J.Liu
  • Proc. of Web Intelligence.
  • 2008
Highly Influential
7 Excerpts

Algorithms for the set covering problem

  • A.Caprara, P.Toth, M.Fishetti
  • Annals of Operations Research 98
  • 2004
Highly Influential
3 Excerpts

Cacheda : Extracting lists of data records from semi - structured web pages

  • P. Jain, L. Gravano
  • Data Knowl Eng
  • 2008

A survey of web information extraction systems

  • C.H.Chang, M.Kayed, M.R.Girgis, K.F.Shaalan
  • IEEE Transactions on Knowledge and Data…
  • 2006

Efficient , automatic web resource harvest

  • A. Pan, J. Raposo, F. Bellas, F.
  • 2006

Similar Papers

Loading similar papers…