Skip to search form
Skip to main content
Skip to account menu
Semantic Scholar
Semantic Scholar's Logo
Search 218,218,479 papers from all fields of science
Search
Sign In
Create Free Account
Robots exclusion standard
Known as:
Robot Exclusion Protocol
, Robots exclusion protocol
, Robots exclusion file
Expand
The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with…
Expand
Wikipedia
(opens in a new tab)
Create Alert
Alert
Related topics
Related topics
24 relations
.htaccess
Apache Nutch
Automated Content Access Protocol
Distributed web crawling
Expand
Broader (1)
World Wide Web
Papers overview
Semantic Scholar uses AI to extract papers important to this topic.
Highly Cited
2017
Highly Cited
2017
RCrawler: An R package for parallel web crawling and scraping
S. Khalil
,
M. Fakir
SoftwareX
2017
Corpus ID: 65067981
2012
2012
Hotel Information Exposure in Cyberspace: The Case of Hong Kong
Rosanna Leung
,
R. Law
Information and Communication Technologies in…
2012
Corpus ID: 59621899
Search engines are an everyday tool for Internet surfing. They are also a critical factor that affects e-business performance…
Expand
2009
2009
Copyright and Copy-Reliant Technology
Matthew J. Sag
2009
Corpus ID: 152501420
This article studies the rise of copy-reliant technologies - technologies such as Internet search engines and plagiarism…
Expand
Review
2008
Review
2008
A larger scale study of robots.txt
Santanu Kolay
The Web Conference
2008
Corpus ID: 14580910
A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots…
Expand
2007
2007
The North Carolina State Government Website Archives: A case study of an American government Web archiving project
Kristin E. Martin
,
Kelly Eubank
New Rev. Hypermedia Multim.
2007
Corpus ID: 45313184
The North Carolina State Archives and State Library of North Carolina collaborated to develop the North Carolina State Government…
Expand
Highly Cited
2006
Highly Cited
2006
Academic Data Collection in Electronic Environments: Defining Acceptable Use of Internet Resources
G. Allen
,
D. Burk
,
G. Davis
MIS Q.
2006
Corpus ID: 32236321
Academic researchers access commercial web sites to collect research data. This research practice is likely to increase. Is this…
Expand
2006
2006
ANALYSIS OF THE USAGE STATISTICS OF ROBOTS EXCLUSION STANDARD
Smitha Ajay
,
Jaliya Ekanayake
2006
Corpus ID: 13936388
Robots Exclusion standard [4] is a de-facto standard that is used to inform the crawlers, spiders or web robots about the…
Expand
2005
2005
Static Analysis of Programs Using Omega Algebra with Tests
Claude Bolduc
,
Josée Desharnais
RelMiCS
2005
Corpus ID: 7438892
Recently, Kozen has proposed a framework based on Kleene algebra with tests for verifying that a program satisfies a security…
Expand
Highly Cited
2004
Highly Cited
2004
Discovery of Web Robot Sessions Based on their Navigational Patterns
P. Tan
,
Vipin Kumar
Data mining and knowledge discovery
2004
Corpus ID: 23725102
Web robots are software programs that automatically traverse the hyperlink structure of the World Wide Web in order to locate and…
Expand
1999
1999
CoBWeb-a crawler for the Brazilian Web
A. D. Silva
,
Eveline Veloso
,
P. B. Golgher
,
B. Ribeiro-Neto
,
Alberto H. F. Laender
,
N. Ziviani
6th International Symposium on String Processing…
1999
Corpus ID: 6065538
One of the key components of current Web search engines is the document collector. The paper describes CoBWeb, an automatic…
Expand
By clicking accept or continuing to use the site, you agree to the terms outlined in our
Privacy Policy
(opens in a new tab)
,
Terms of Service
(opens in a new tab)
, and
Dataset License
(opens in a new tab)
ACCEPT & CONTINUE