Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives

@article{Souza2015SemanticUA,
  title={Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives},
  author={Tarcisio Souza and Elena Demidova and Thomas Risse and Helge Holzmann and Gerhard Gossen and Julian Szymanski},
  journal={ArXiv},
  year={2015},
  volume={abs/1702.00619}
}
  • Tarcisio Souza, Elena Demidova, +3 authors Julian Szymanski
  • Published in
    International KEYSTONE…
    2015
  • Computer Science
  • ArXiv
  • Long-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to access the documents within Web archives is provided through their URLs, which are typically stored in dedicated index files. The URLs of the… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Figures, Tables, and Topics from this paper.

    Citations

    Publications citing this paper.

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 15 REFERENCES

    Probing a nation’s web sphere: A new approach to web history and a new kind of historical source

    • Niels Brügger
    • In Proceedings of the 2014 ACM conference on Web science,
    • 2014
    VIEW 1 EXCERPT