Corpus ID: 420

Reconstructing Websites for the Lazy Webmaster

  title={Reconstructing Websites for the Lazy Webmaster},
  author={F. McCown and J. Smith and Michael L. Nelson and J. Bollen},
  • F. McCown, J. Smith, +1 author J. Bollen
  • Published 2005
  • Computer Science
  • ArXiv
  • Backup or preservation of websites is often not considered until after a catastrophic event has occurred. In the face of complete website loss, “lazy” webmasters or concerned third parties may be able to recover some of their website from the Internet Archive. Other pages may also be salvaged from commercial search engine caches. We introduce the concept of “lazy preservation”- digital preservation performed as a result of the normal operations of the Web infrastructure (search engines and… CONTINUE READING


    Publications referenced by this paper.
    Crawling the Hidden Web
    • 815
    • PDF
    LOCKSS: A Permanent Web Publishing and Access System
    • 108
    • Highly Influential
    • PDF
    Effective page refresh policies for Web crawlers
    • 255
    • PDF
    Downloading textual hidden web content through keyword queries
    • 227
    • PDF
    The Evolution of the Web and Implications for an Incremental Crawler
    • 615
    • PDF
    A large-scale study of the evolution of web pages
    • 345
    • PDF
    Syntactic Clustering of the Web
    • 1,449
    • PDF
    Crawlets: Agents for High Performance Web Search Engines
    • 29
    Efficient URL caching for world wide web crawling
    • 84
    • PDF