How much of the web is archived?

@article{Ainsworth2011HowMO,
  title={How much of the web is archived?},
  author={Scott Ainsworth and Ahmed Alsum and Hany SalahEldeen and Michele C. Weigle and Michael L. Nelson},
  journal={ArXiv},
  year={2011},
  volume={abs/1212.6177}
}
  • Scott Ainsworth, Ahmed Alsum, +2 authors Michael L. Nelson
  • Published 2011
  • Computer Science
  • ArXiv
  • The Memento Project's archive access additions to HTTP have enabled development of new web archive access user interfaces. After experiencing this web time travel, the in- evitable question that comes to mind is "How much of the Web is archived?" This question is studied by approximating the Web via sampling URIs from DMOZ, Delicious, Bitly, and search engine indexes and measuring number of archive copies available in various public web archives. The results indicate that 35%-90% of URIs have… CONTINUE READING

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 77 CITATIONS

    Only One Out of Five Archived Web Pages Existed as Presented

    VIEW 6 EXCERPTS
    CITES BACKGROUND, METHODS & RESULTS

    Who and what links to the Internet Archive

    VIEW 1 EXCERPT
    CITES BACKGROUND

    The impact of JavaScript on archivability

    Rewriting History: Changing the Archived Web from the Present

    FILTER CITATIONS BY YEAR

    2012
    2020

    CITATION STATISTICS

    • 2 Highly Influenced Citations

    • Averaged 6 Citations per year from 2018 through 2020

    References

    Publications referenced by this paper.
    SHOWING 1-2 OF 2 REFERENCES

    Random sampling from a search engine's index

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    The indexable web is more than 11.5 billion pages

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL