Transparent Format Migration of Preserved Web Content

  title={Transparent Format Migration of Preserved Web Content},
  author={David Stuart Holmes Rosenthal and Thomas A. Lipkis and Thomas Robertson and Seth Morabito},
  journal={D Lib Mag.},
The LOCKSS digital preservation system collects content by crawling the web and preserves it in the format supplied by the publisher. Eventually, browsers will no longer understand that format. A process called format migration converts it to a newer format that the browsers do understand. The LOCKSS program has designed and tested an initial implementation of format migration for Web content that is transparent to readers, building on the content negotiation capabilities of HTTP. 

Figures from this paper

Dynamic Web File Format Transformations with Grace

Grace, an http proxy server that transparently converts browser-incompatible and obsolete web content into web content that a browser is able to display without the use of plug-in software is introduced.

Digital archives as versatile platforms for sharing and interlinking research artefacts

How Digital Archives can play a key role as a versatile data sharing platform exposing and interlinking all the artefacts of a given research process is discussed.

Using Scalable and Secure Web Technologies to Design a Global Digital Format Registry Prototype : Architecture , Implementation , and Testing

The architecture, design, and testing of a Global Digital Format Registry (GDFR) based on scalable, extensible, and secure web technologies is presented, which will easily be able to incorporate advances achieved through various methodologies and will easily adapt with emerging technologies as it is based on platform independent technologies.

Lazy Preservation: Reconstructing Websites from the Web Infrastructure

In this book, the Web Infrastructure is characterized by its preservation capacity and behavior, and a new type of crawler is introduced: the web-repository crawler.

Migrating Web Archives from HTML4 to HTML5: A Block-Based Approach and Its Evaluation

This work proposes a migration tool from HTML4 to HTML5, and uses an evaluation framework for Web page segmentation, that helps defining and computing relevant metrics to measure the quality of the migration process.

ETD 2010 1 HTML 5 ETDs

A software prototype to convert plain ETDs (Electronic Theses and Dissertations) into HTML5 in a semi-automatic way, which will make it easier to read and preserve multimedia and hypermedia ETDs.

Difficulties of Timestamping Archived Web Pages

It is shown that state-of-the-art services for creating trusted timestamps in blockchain-based networks do not adequately allow for timestamping of web pages, and several requirements to be fulfilled in order to produce repeatable hash values for archived web pages are introduced.

A survey of digital preservation strategies

An overview of the current main preservation strategies is given and two procedures for making information accessible and manipulable over time are designed, which further discuss and compare those strategies.

Distributed Digital Preservation: Private LOCKSS Networks as Business, Social, and Technical Frameworks

The Library of Congress' National Digital Information Infrastructure and Preservation Program (NDIIPP) has helped underwrite the development of highly targeted collaborative preservation networks

Strategies For Preservation Of Web-Based Content: Intellectual Property Barriers To Building The Sudan Web Archive

This study aims to advocate a balance between the information seeking needs and the protection of IPRs for establishment of the Sudan Web Archive by investigating the management of Web-based content as part of the national library holdings.



A Web-Based Paradigm for File Migration

The architecture of this web-based paradigm for file migration, which allows the bulk conversion of electronic documents to PDF in a manner that minimizes certain aspects of the migration cost, is described.

Hypertext Transfer Protocol - HTTP/1.0

The Hypertext Transfer Protocol is an application-level protocol for distributed, collaborative, hypermedia information systems, which can be used for many tasks beyond its use for hypertext through extension of its request methods, error codes and headers.

Mediating Among Diverse Data Formats.

This thesis has developed a data model and system of mediator agents that support the widespread use of diverse data formats much more effectively than current approaches do, and describes and evaluates the design and implementation of this data model, known as the Typed Object Model (or TOM), and the system of mediation agents that supports it.

Digital Preservation and Permanent Access: The UVC for Images

This paper focuses on the development and practical use of one of the permanent access tools: the Universal Virtual Computer (UVC) for images, developed in collaboration with IBM as part of a new Preservation Subsystem for the e-Depot.

Ensuring the Longevity of Digital Documents

The digital medium is replacing paper in a dramatic record-keeping revolution, but such documents may be lost unless the authors act now, and must invest careful thought and significant effort to preserve these documents for the future.

Preserving peer replicas by rate-limited sampled voting

The LOCKSS project presents a design for and simulations of a novel protocol for voting in systems of this kind that incorporates rate limitation and intrusion detection to ensure that even some very powerful adversaries attacking over many years have only a small probability of causing irrecoverable damage before being detected.

The Scientific American

This revised picture resolves two disturbing mysteries: why many heart attacks strike without warning and why preventative therapies sometimes fail and highlights the need for better prevention, detection and treatment.

An Approach to the Preservation of Digital Records

Format-Specific Digital Object Validation

  • 1999