Identification and Archiving of the Czech Web Outside the National Domain

  • Ivan Vlcek Masaryk
  • Published 2008
The goal of the work was to design and realize the system for identification and archiving of web informative sources as a part of the archiving system Heritrix. The System should automatically identify these sources in a most effective and precise way and archive them for the usage of project WebArchiv of National Library of the Czech Republic. The code… CONTINUE READING