Measuring the web crawler ethics

@inproceedings{Giles2010MeasuringTW,
  title={Measuring the web crawler ethics},
  author={C. Lee Giles and Y. Sun and Isaac G. Councill},
  booktitle={WWW '10},
  year={2010}
}
  • C. Lee Giles, Y. Sun, Isaac G. Councill
  • Published in WWW '10 2010
  • Computer Science
  • Web crawlers are highly automated and seldom regulated manually. The diversity of crawler activities often leads to ethical problems such as spam and service attacks. In this research, quantitative models are proposed to measure the web crawler ethics based on their behaviors on web servers. We investigate and define rules to measure crawler ethics, referring to the extent to which web crawlers respect the regulations set forth in robots.txt configuration files. We propose a vector space model… CONTINUE READING
    25 Citations

    Tables and Topics from this paper

    Identification and characterization of crawlers through analysis of web logs
    • 4
    A novel defense mechanism against web crawlers intrusion
    • 5
    Mining Web Logs to Identify Search Engine Behaviour at Websites
    • PDF
    Differences in Time Delay between Search Engine Crawlers at Web Sites
    • 1
    • PDF
    Understanding Website Behavior based on User Agent
    • 8
    Application of ARIMA(1,1,0) Model for Predicting Time Delay of Search Engine Crawlers
    • 2
    • PDF
    A Forecasting Model for the Pages Crawled by Search Engine Crawlers at a Web Site
    • 3
    • Highly Influenced

    References

    A larger scale study of robots.txt
    • 11
    • Highly Influential
    • PDF