Determining Bias to Search Engines from Robots.txt

  title={Determining Bias to Search Engines from Robots.txt},
  author={Nizar Ghoula and Khaled Khelif and Rose Dieng},
  journal={IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)},
Search engines largely rely on robots (i.e., crawlers or spiders) to collect information from the Web. Such crawling activities can be regulated from the server side by deploying the Robots Exclusion Protocol in a file called robots.txt. Ethical robots will follow the rules specified in robots.txt. Websites can explicitly specify an access preference for each robot by name. Such biases may lead to a "rich get richer" situation, in which a few popular search engines ultimately dominate the Web… CONTINUE READING


Publications citing this paper.


Publications referenced by this paper.

Similar Papers

Loading similar papers…