robots.txt

« Back to Glossary Index

History and Standardization of robots.txt
– Proposed by Martijn Koster in February 1994
– Proposed on the www-talk mailing list
– Provoked by Charles Stross’ badly-behaved web crawler
– Became a de facto standard for web crawlers
– Officially standardized by Google in July 2019
– Robots.txt file is placed in the root of the website hierarchy
– Contains instructions for web robots in a specific format
– Instructs robots which web pages they can and cannot access
– Important for web crawlers from search engines like Google
– Each subdomain and protocol/port needs its own robots.txt file

Security and Limitations of robots.txt
– Robots.txt compliance is voluntary and advisory
– Malicious web robots may ignore robots.txt instructions
– Security through obscurity is discouraged by standards bodies
– NIST recommends against relying on secrecy for system security
– Robots.txt should not be solely relied upon for security purposes

Alternatives to robots.txt
– Robots can pass a special user-agent to the web server
– Server can be configured to return failure or alternative content
– Some sites have humans.txt files for human-readable information
– Some sites redirect humans.txt to an About page
Google previously had a joke file instructing the Terminator not to harm the founders

Nonstandard extensions of robots.txt
– Crawl-delay directive allows throttling of bot visits
– Interpretation of crawl-delay value depends on the crawler
Yandex uses crawl-delay as the number of seconds to wait between visits
– Bing defines crawl-delay as the size of a time window for accessing a site
Google provides a search console interface for controlling bot visits
– Some crawlers support the Sitemap directive in robots.txt
– Allows multiple Sitemaps in the same robots.txt file
Sitemaps are specified with full URLs
– BingBot defines crawl-delay as the size of a time window
Google provides a search console interface for managing Sitemaps

Meta tags, headers, and related concepts
– Robots exclusion directives can be applied through meta tags and HTTP headers.
– Robots meta tags cannot be used for non-HTML files.
– X-Robots-Tag can be added to non-HTML files using .htaccess and httpd.conf files.
– A noindex meta tag can be used to exclude a page from indexing.
– A noindex HTTP response header can also be used to exclude a page from indexing.
– ads.txt: a standard for listing authorized ad sellers.
– security.txt: a file for reporting security vulnerabilities.
– Automated Content Access Protocol: a failed proposal to extend robots.txt.
– BotSeer: an inactive search engine for robots.txt files.
Distributed web crawling: a technique for distributing web crawling tasks.

Note: The content related to maximum size of a robots.txt file, robots.txt and web archives, and external links has been excluded as they do not fit into the identified groups.

robots.txt (Wikipedia)

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

This relies on voluntary compliance. Not all robots comply with the standard; indeed, email harvesters, spambots, malware and robots that scan for security vulnerabilities may very well start with the portions of the website they have been asked (by the Robots Exclusion Protocol) to stay out of.

The "robots.txt" file can be used in conjunction with sitemaps, another robot inclusion standard for websites.

« Back to Glossary Index

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!

Gabrielle Buff
Gabrielle Buff

Just left us a 5 star review

google

Great customer service and was able to walk us through the various options available to us in a way that made sense. Would definitely recommend!

google

Stoute Web Solutions has been a valuable resource for our business. Their attention to detail, expertise, and willingness to help at a moment's notice make them an essential support system for us.

google

Paul and the team are very professional, courteous, and efficient. They always respond immediately even to my minute concerns. Also, their SEO consultation is superb. These are good people!

google

Paul Stoute & his team are top notch! You will not find a more honest, hard working group whose focus is the success of your business. If you’re ready to work with the best to create the best for your business, go Stoute Web Solutions; you’ll definitely be glad you did!

google

Wonderful people that understand our needs and make it happen!

google

Paul is the absolute best! Always there with solutions in high pressure situations. A steady hand; always there when needed; I would recommend Paul to anyone!

facebook
Vince Fogliani
recommends

The team over at Stoute web solutions set my business up with a fantastic new website, could not be happier

facebook
Steve Sacre
recommends

If You are looking for Website design & creativity look no further. Paul & his team are the epitome of excellence.Don't take my word just refer to my website "stevestours.net"that Stoute Web Solutions created.This should convince anyone that You have finally found Your perfect fit

facebook
Jamie Hill
recommends

Paul and the team at Stoute Web are amazing. They are super fast to answer questions. Super easy to work with, and knows their stuff. 10,000 stars.

facebook

Paul and the team from Stoute Web solutions are awesome to work with. They're super intuitive on what best suits your needs and the end product is even better. We will be using them exclusively for our web design and hosting.

facebook
Dean Eardley
recommends

Beautifully functional websites from professional, knowledgeable team.

google

Along with hosting most of my url's Paul's business has helped me with website development, graphic design and even a really cool back end database app! I highly recommend him as your 360 solution to making your business more visible in today's social media driven marketplace.

yelp

I hate dealing with domain/site hosts. After terrible service for over a decade from Dreamhost, I was desperate to find a new one. I was lucky enough to win...

google

Paul Stoute has been extremely helpful in helping me choose the best package to suite my needs. Any time I had a technical issue he was there to help me through it. Superb customer service at a great value. I would recommend his services to anyone that wants a hassle free and quality experience for their website needs.

google

Paul is the BEST! I am a current customer and happy to say he has never let me down. Always responds quickly and if he cant fix the issue right away, if available, he provides you a temporary work around while researching the correct fix! Thanks for being an honest and great company!!

google

Paul Stoute is absolutely wonderful. Paul always responds to my calls and emails right away. He is truly the backbone of my business. From my fantastic website to popping right up on Google when people search for me and designing my business cards, Paul has been there every step of the way. I would recommend this company to anyone.

yelp

I can't say enough great things about Green Tie Hosting. Paul was wonderful in helping me get my website up and running quickly. I have stayed with Green...