Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a plain-text file placed at the root of a website (e.g., https://example.com/robots.txt) that tells search engine crawlers and other bots which pages or sections of the site they are allowed or not allowed to access. It follows the Robots Exclusion Protocol, a widely adopted web standard. Think of it as a set of directions you leave for bots before they enter your site.

Question 2

Does robots.txt affect SEO?

Accepted Answer

Yes, robots.txt has a direct impact on what search engines can crawl — and therefore, what they can potentially index and rank. If you accidentally block critical pages with robots.txt, those pages will not appear in search results, regardless of how good their content is. However, robots.txt controls crawling, not indexing: a page can still appear in search results if it's linked from elsewhere, even if robots.txt blocks it from being crawled. For full control over indexing, use the noindex meta tag.

Question 3

What is the difference between robots.txt and noindex?

Accepted Answer

robots.txt tells crawlers not to visit a page. The noindex meta tag tells crawlers they can visit the page but should not include it in search results. These serve different purposes: users robots.txt to save crawl budget and prevent access to private pages; use noindex to let crawlers access content but keep it out of search results. Blocking a page in robots.txt and adding noindex is redundant — crawlers cannot read the noindex tag if they're blocked from accessing the page.

Question 4

What does Disallow: / mean in robots.txt?

Accepted Answer

"Disallow: /" means that all paths on the website are blocked. When combined with "User-agent: *" (which targets all bots), this effectively prevents every search engine and crawler from accessing any page on your site. This is one of the most common and damaging robots.txt mistakes — it completely removes your site from search engine indexes. Only use "Disallow: /" for specific bots you intentionally want to block, never under "User-agent: *" unless you want your site to disappear from search.

Question 5

What is User-agent in robots.txt?

Accepted Answer

The User-agent directive specifies which bot or crawler the following rules apply to. "User-agent: *" targets all bots, while "User-agent: Googlebot" applies rules only to Google's crawler. You can have multiple User-agent groups in a single robots.txt file, each with their own set of Disallow and Allow rules. Rules are applied per-group, so different bots can receive different instructions.

Question 6

What is the Sitemap directive in robots.txt?

Accepted Answer

The Sitemap directive in robots.txt tells crawlers where to find your XML sitemap — for example: "Sitemap: https://example.com/sitemap.xml". This is one of the most effective ways to help search engines discover all your important pages quickly. Including a Sitemap line in your robots.txt is a widely recommended SEO best practice, even if you've also submitted your sitemap directly in Google Search Console.

Question 7

Can robots.txt block AI crawlers like GPTBot?

Accepted Answer

Yes. You can use robots.txt to instruct specific AI crawlers to stay off your site. For example, "User-agent: GPTBot" followed by "Disallow: /" will ask OpenAI's crawler to avoid your content. Other AI bot user-agents include CCBot (Common Crawl), Google-Extended (Google AI training data), and PerplexityBot. Note that these bots are expected to respect robots.txt, but compliance is voluntary — there is no technical enforcement mechanism.

Question 8

How do I test my robots.txt file?

Accepted Answer

You can test your robots.txt file using tools like this one (enter your domain above), or by using Google Search Console's robots.txt tester, which also lets you test how specific URLs are treated by your current rules. Directly accessing https://yourdomain.com/robots.txt in a browser is the quickest way to confirm the file exists and see its contents.

Question 9

What is Crawl-delay in robots.txt?

Accepted Answer

"Crawl-delay" is an optional directive that tells crawlers how many seconds to wait between requests. For example, "Crawl-delay: 10" asks bots to wait 10 seconds between each page fetch. It's intended to reduce server load caused by aggressive crawling. Note that Google does not officially support the Crawl-delay directive — use Google Search Console's crawl rate settings to manage Googlebot's speed instead.

Question 10

What is the maximum size for a robots.txt file?

Accepted Answer

Google's crawlers will read the first 500 kibibytes (approximately 500 KB) of a robots.txt file and ignore anything beyond that. Most sites will never come close to this limit — a well-structured robots.txt is typically a few dozen lines. If your file is growing very large, it's worth auditing whether all those rules are still needed.

Question 11

Does having no robots.txt hurt SEO?

Accepted Answer

Not having a robots.txt file does not directly harm your SEO. Without a robots.txt file, well-behaved crawlers will assume they can access everything. However, without one you miss the opportunity to declare your sitemap location, block private areas of your site, and control crawl budget for large websites. Adding a properly configured robots.txt is a technical SEO best practice that costs little effort but provides real benefits.

robots.txt Checker & Validator

What is a robots.txt file?

Example robots.txt

Key terms explained

Why robots.txt Matters for Technical SEO

Controls Crawl Budget

Guides Bot Discovery

Controls AI Crawlers

The most dangerous robots.txt mistake

robots.txt controls crawling — not indexing

Common robots.txt Mistakes and How to Fix Them

Blocks all bots from everything

Accidentally blocking a key page

Missing Sitemap declaration

Malformed lines without colons

Directives without a User-agent header

Overly broad wildcard blocking query strings

How This robots.txt Checker Works

File existence

Public accessibility

Content-Type

User-agent presence

Disallow/Allow rules

Sitemap declaration

Important paths

Disallow all detection

Broad wildcard rules

Malformed lines

Orphan directives

File preview

Understanding Results

Frequently Asked Questions About robots.txt

Check Another Domain