Effective page refresh policies for Web crawlers

  title={Effective page refresh policies for Web crawlers},
  author={Junghoo Cho and H. Garcia-Molina},
  journal={ACM Trans. Database Syst.},
  • Junghoo Cho, H. Garcia-Molina
  • Published 2003
  • Computer Science
  • ACM Trans. Database Syst.
  • In this article, we study how we can maintain local copies of remote data sources "fresh," when the source data is updated autonomously and independently. In particular, we study the problem of Web crawlers that maintain local copies of remote Web pages for Web search engines. In this context, remote data sources (Websites) do not notify the copies (Web crawlers) of new changes, so we need to periodically poll the sources to maintain the copies up-to-date. Since polling the sources takes… CONTINUE READING
    Clustering-based incremental web crawling
    • 51
    • PDF
    Efficiently Detecting Webpage Updates Using Samples
    • 10
    • PDF
    Parallel crawler architecture and web page change detection
    • 33
    • PDF
    Coherence-Oriented Crawling and Navigation Using Patterns for Web Archives
    • 11
    • PDF
    A Hybrid Revisit Policy For Web Search
    • 3


    Publications referenced by this paper.
    An Introduction to Stochastic Modeling (2nd Ed.)
    • 74
    • Highly Influential