Modelling information persistence on the web

  title={Modelling information persistence on the web},
  author={Daniel Gomes and M{\'a}rio J. Silva},
Models of web data persistency are essential tools for the designof efficient information extraction systems that repeatedlycollect and process the data. This study models the persistence ofweb data through the measurement of URL and content persistenceacross several snapshots of a national community web, collectedfor 3 years. We found that the lifetimes of URLs and contents aremodelled by logarithmic functions. We gathered statistics on thestructure of the web, identified reasons for URL death… CONTINUE READING