Determining Bias to Search Engines from Robots.txt

@article{Ghoula2007DeterminingBT,
  title={Determining Bias to Search Engines from Robots.txt},
  author={Nizar Ghoula and Khaled Khelif and Rose Dieng},
  journal={IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)},
  year={2007},
  pages={149-155}
}
Search engines largely rely on robots (i.e., crawlers or spiders) to collect information from the Web. Such crawling activities can be regulated from the server side by deploying the Robots Exclusion Protocol in a file called robots.txt. Ethical robots will follow the rules specified in robots.txt. Websites can explicitly specify an access preference for each robot by name. Such biases may lead to a "rich get richer" situation, in which a few popular search engines ultimately dominate the Web… CONTINUE READING

Citations

Publications citing this paper.

References

Publications referenced by this paper.

Similar Papers

Loading similar papers…