Corpus ID: 2043639

Archiving Deferred Representations Using a Two-Tiered Crawling Approach

  title={Archiving Deferred Representations Using a Two-Tiered Crawling Approach},
  author={J. F. Brunelle and Michele C. Weigle and Michael L. Nelson},
  • J. F. Brunelle, Michele C. Weigle, Michael L. Nelson
  • Published 2015
  • Computer Science
  • ArXiv
  • Web resources are increasingly interactive, resulting in resources that are increasingly difficult to archive. The archival difficulty is based on the use of client-side technologies (e.g., JavaScript) to change the client-side state of a representation after it has initially loaded. We refer to these representations as deferred representations. We can better archive deferred representations using tools like headless browsing clients. We use 10,000 seed Universal Resource Identifiers (URIs) to… CONTINUE READING
    14 Citations
    Archival Crawlers and JavaScript: Discover More Stuff but Crawl More Slowly
    • 8
    • PDF
    Scripts in a frame: A framework for archiving deferred representations
    • 3
    • PDF
    Client-Side Reconstruction of Composite Mementos Using ServiceWorker
    • 10
    • PDF
    A Framework for Aggregating Private and Public Web Archives
    • Mat Kelly
    • Computer Science
    • Bull. IEEE Tech. Comm. Digit. Libr.
    • 2015
    • 6
    • PDF


    The impact of JavaScript on archivability
    • 26
    • PDF
    Memento: Time Travel for the Web
    • 123
    • PDF
    Crawling AJAX by Inferring User Interface State Changes
    • 197
    • PDF
    Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes
    • 256
    • PDF
    Incremental Crawling with Heritrix
    • 46
    • PDF
    On the Change in Archivability of Websites Over Time
    • 23
    • PDF
    SHARC: Framework for Quality-Conscious Web Archiving
    • 35
    • PDF
    Data quality in web archiving
    • 49
    • PDF
    ‘Wayback’ for Accessing Web Archives
    • Brad Tofel
    • 2007
    • 42
    • PDF