What's there and what's not?: focused crawling for missing documents in digital libraries

@article{Zhuang2005WhatsTA,
  title={What's there and what's not?: focused crawling for missing documents in digital libraries},
  author={Ziming Zhuang and Rohit Wagle and C. Lee Giles},
  journal={Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)},
  year={2005},
  pages={301-310}
}
Some large scale topical digital libraries, such as CiteSeer, harvest online academic documents by crawling open-access archives, university and author homepages, and authors' self-submissions. While these approaches have so far built reasonable size libraries, they can suffer from having only a portion of the documents from specific publishing venues. We propose to use alternative online resources and techniques that maximally exploit other resources to build the complete document collection… Expand
56 Citations
Author Homepage Discovery in CiteSeerX
  • PDF
On the Use of Web Search to Improve Scientific Collections
  • PDF
Focused crawling of tagged web resources using ontology
  • 27
Effects of Start URLs in Focused Web Crawling
...
1
2
3
4
5
...

References

SHOWING 1-2 OF 2 REFERENCES
Finding scientific papers with homepagesearch and MOPS
  • 13
  • Highly Influential
  • PDF
The PageRank Citation Ranking : Bringing Order to the Web
  • 13,152
  • Highly Influential
  • PDF