Genealogical trees on the web: a search engine user perspective

@inproceedings{BaezaYates2008GenealogicalTO,
  title={Genealogical trees on the web: a search engine user perspective},
  author={Ricardo A. Baeza-Yates and {\'A}lvaro R. Pereira and Nivio Ziviani},
  booktitle={WWW},
  year={2008}
}
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation… CONTINUE READING
Highly Cited
This paper has 27 citations. REVIEW CITATIONS

Similar Papers

Loading similar papers…