With the proliferation of public web archives, it is becoming more important to better profile their contents, both to understand their immense holdings as well as to support routing of requests in the Memento aggregator. To save time, the Memento aggregator should only poll the archives that are likely to have a copy of the requested URI. Using the crawler… (More)
We propose an approach to index raster images of dictionary pages which in turn would require very little manual effort to enable direct access to the appropriate pages of the dictionary for lookup. Accessibility is further improved by feedback and crowdsourcing that enables highlighting of the specific location on the page where the lookup word is found,… (More)
We examine how well various HTTP methods are supported by public web services. We sample 40,870 live URIs from the DMOZ collection (a curated directory of World Wide Web URIs) and found that about 55% URIs claim support (in the Allow header) for GET and POST methods, but less than 2% of the URIs claim support for one or more of PUT, PATCH, or DELETE methods.
HTTP MAILBOX-ASYNCHRONOUS RESTFUL COMMUNICATION Traditionally, general web services used only the GET and POST methods of HTTP while several other HTTP methods like PUT, PATCH, and DELETE were rarely utilized. Additionally, the Web was mainly navigated by humans using web browsers and clicking on hyperlinks or submitting HTML forms. Clicking on a link is… (More)
The Memento protocol makes it easy to build a uniform lookup service to aggregate the holdings of web archives. However, there is a lack of tools to utilize this capability in archiving applications and research projects. We created MemGator, an open source, easy to use, portable, concurrent, cross-platform, and self-documented Memento aggregator CLI and… (More)
To facilitate permanence and collaboration in web archives, we built InterPlanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before… (More)