The evolution of web archiving

@article{Costa2016TheEO,
  title={The evolution of web archiving},
  author={Miguel Costa and Daniel Gomes and M{\'a}rio J. Silva},
  journal={International Journal on Digital Libraries},
  year={2016},
  volume={18},
  pages={191-205}
}
Web archives preserve information published on the web or digitized from printed publications. [...] Key Result Our results show that during the last years there was a significant growth in initiatives and countries hosting these initiatives, volume of data and number of contents preserved.Expand
Web Archive†
This article deals with the function of general web archives within the emerging organization of fastgrowing digital knowledge resources. It opens with a brief overview of reasons why general webExpand
Global trends in library web-archives
TLDR
The study findings demonstrate that web-archives are selected to supplement the libraries' digital collections on hot topics, like COVID-19, or to meet the demands of specific user groups. Expand
Launching a Web Archives Program at a Public University
Many organizations and institutions rely heavily on a web presence to disseminate information and to manage programs and policies. This tendency leaves library and archive professionals with aExpand
If these crawls could talk: Studying and documenting web archives provenance
TLDR
The decision space of web archives is examined and its role in shaping what is and what is not captured in the web archiving process is examined, and a framework for documenting key dimensions of a collection is proposed that addresses the situated nature of the organizational context, technical specificities, and unique characteristics of web materials that are the focus of acollection. Expand
Políticas e tecnologias de preservação digital no arquivamento da web
The objective of this paper was to analyze digital preservation from the web archiving approach, addressing the technologies involved in the archiving process, as well as policies for the selection,Expand
Accessing Web Archives: Integrating an Archive-It Collection into EBSCO Discovery Service
TLDR
Working together, the team of archivists and technical services librarians incorporated the web archive collections into the Libraries’ EBSCO Discovery Service (EDS) discovery layer and indexed content on a single, user-friendly platform. Expand
Concepts and tools for the effective and efficient use of web archives
TLDR
This work presents a retrospective analysis of crawl metadata on the size, age and growth of a Web dataset, and proposes a programming framework for efficiently processing archival collections. Expand
Full-Text and URL Search Over Web Archives
  • Miguel Costa
  • Computer Science
  • ArXiv
  • 2021
TLDR
While web search engines enable searching over the most recent web snapshot, web archives allow searching overmultiple snapshots from the past, which means that web archives have to deal with a temporal dimension that is the cause of new challenges and opportunities. Expand
What's cached is prologue: Reviewing recent web archives research towards supporting scholarly use
Web archives are essential to support historical scholarship in the online age. Research on web archives spans many disciplines, often requiring domain‐specific expertise. The wide‐ranging nature ofExpand
Big Data Science Over the Past Web
TLDR
This Chapter presents several examples of big data tools, machine learning frameworks and deep learning algorithms that significantly increase the scalability and performance of several computational tasks, especially over text, image and audio and gives an overview of their application to support longitudinal studies over web archive collections. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 58 REFERENCES
A Survey on Web Archiving Initiatives
TLDR
The obtained results showed that the number of web archiving initiatives significantly grew after 2003 and they are concentrated on developed countries, and the assigned resources are scarce. Expand
The Importance of Web Archives for Humanities
The web is the primary means of communication in developed societies. It contains descriptions of recent events generated through distinct perspectives. Thus, the web is a valuable resource forExpand
Web Archiving in the United States - A 2017 Survey
From October 2 to November 20, 2017, a working group of individuals representing  multiple NDSA member institutions and interest groups conducted a survey of  organizations in the United StatesExpand
Functionalities of Web Archives
  • J. Niu
  • Computer Science
  • D Lib Mag.
  • 2012
TLDR
A functionality checklist was designed, based on use cases created by the International Internet Preservation Consortium (IIPC), and the findings of two related user studies, and a comprehensive literature review of web archiving methods were conducted. Expand
Evaluating Web Archive Search Systems
TLDR
An evaluation methodology for web archive search systems based on a list of requirements compiled from previous characterizations of web archives and their users is proposed and it is shown how to combine temporal features, along with the regular topical features, to improve the search effectiveness on web archives. Expand
Sprint methods for web archive research
TLDR
This paper presents "sprint-methods" for performing research using an archived collection of the Dutch news aggregator Website Nu.nl, and for developing and adapting a search system and interface to this data. Expand
Characterizing Search Behavior in Web Archives
TLDR
This work presents the first search behavior characterization of web archive users, finding a strong evidence that users prefer the oldest documents over the newest, but mostly search without any temporal restriction. Expand
Memento: Time Travel for the Web
TLDR
The Memento solution is a framework in which archived resources can seamlessly be reached via the URI of their original: protocol-based time travel for the Web. Expand
What's new on the web?: the evolution of the web from a search engine perspective
TLDR
The authors' findings indicate a rapid turnover rate of Web pages, i.e., high rates of birth and death, coupled with an even higher rate ofturnover in the hyperlinks that connect them, which is likely to remain consistent over time. Expand
How much of the web is archived?
TLDR
The Memento Project's archive access additions to HTTP have enabled development of new web archive access user interfaces, and approximating the Web via sampling URIs from DMOZ, Delicious, Bitly, and search engine indexes and measuring number of archive copies available in various public web archives indicates that 35%-90% of URIs have at least one archived copy. Expand
...
1
2
3
4
5
...