An adaptive model for optimizing performance of an incremental web crawler

@inproceedings{Edwards2001AnAM,
  title={An adaptive model for optimizing performance of an incremental web crawler},
  author={J. Edwards and K. McCurley and J. Tomlin},
  booktitle={WWW '01},
  year={2001}
}
  • J. Edwards, K. McCurley, J. Tomlin
  • Published in WWW '01 2001
  • Computer Science
  • This paper outlines the design of a web crawler implemented for IBM Almaden's WebFountain project and describes an optimization model for controlling the crawl strategy. This crawler is scalable and incremental. The model makes no assumptions about the statistical behaviour of web page changes, but rather uses an adaptive approach to maintain data on actual change rates which are in turn used as inputs for the optimization. Computational results with simulated but realistic data show that there… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Web Crawling
    • 312
    Scheduling algorithms for Web crawling
    • 69
    • PDF
    Estimating frequency of change
    • 364
    • PDF
    High-performance web crawling
    • 127
    • PDF
    Effective page refresh policies for Web crawlers
    • 255
    • PDF
    Efficient URL caching for world wide web crawling
    • 84
    • Highly Influenced
    • PDF
    Effective web crawling
    • 183
    • PDF
    Clustering-based incremental web crawling
    • 51
    • PDF

    References

    Publications referenced by this paper.
    SHOWING 1-4 OF 4 REFERENCES
    How dynamic is the Web?
    • 353
    • Highly Influential
    • PDF
    Synchronizing a database to improve freshness
    • 351
    • Highly Influential
    • PDF
    Towards a Better Understanding of Web Resources and Server Responses for Improved Caching
    • 71
    • Highly Influential
    • PDF
    Keeping up with the changing Web
    • 171
    • Highly Influential
    • PDF