robots.txt Checker & Validator
The only free robots.txt checker that shows what you've blocked for ChatGPT, Perplexity, and Claude — alongside full directive parsing, crawlability analysis, and plain-English SEO recommendations. No signup required.
What is a robots.txt file?
A simple but critical file that tells search engines and bots how to crawl your website.
A robots.txt file is a plain-text file placed at the root of your website (e.g. https://example.com/robots.txt) that follows the Robots Exclusion Protocol — a decades-old web standard.
It allows site owners to control which pages search engines like Google, Bing, and others can crawl. You can allow everything, block specific sections (like admin pages or duplicate content), or restrict access to certain bots entirely.
It's one of the first files most crawlers look at when they visit your site — making it a foundational piece of technical SEO configuration.
Example robots.txt
User-agent: * Disallow: /admin/ Disallow: /checkout/ Allow: /blog/ User-agent: GPTBot Disallow: / Sitemap: https://example.com/sitemap.xml
Key terms explained
User-agentWhich bot this rule applies to (* = all bots)DisallowPaths the bot should NOT crawlAllowPaths the bot IS allowed to crawl (overrides Disallow)SitemapURL of your XML sitemap for faster discoveryWhy robots.txt Matters for Technical SEO
Controls Crawl Budget
Search engines allocate a limited number of crawl requests per site. By blocking pages that don't need to be indexed (like admin dashboards, duplicate pages, or internal search results), you help search engines focus on what actually matters.
Guides Bot Discovery
The Sitemap directive in robots.txt points crawlers directly to your XML sitemap, accelerating discovery of your most important pages. It's a small addition that can meaningfully improve crawl efficiency.
Controls AI Crawlers
AI companies increasingly crawl the web to train their models. robots.txt lets you opt specific AI bots out of accessing your content — useful if you have concerns about your content being used for AI training without your consent. For fine-grained AI access control, also check your llms.txt file — it lets you signal intent to AI systems beyond what robots.txt covers.
The most dangerous robots.txt mistake
Disallow: / under User-agent: * blocks ALL search engines from crawling your entire website. This single configuration error can cause your site to disappear from Google almost entirely. It's surprisingly common — especially after site migrations, CMS updates, or when developers forget to remove a staging-environment block before launch. Our checker flags this immediately.
robots.txt controls crawling — not indexing
An important nuance: blocking a page in robots.txt prevents crawlers from visiting it, but doesn't guarantee that page won't appear in search results. If the page is linked from other sites, Google may still list it with a "No information available" snippet. To fully prevent a page from appearing in search results, use the noindex meta tag — and don't block that page in robots.txt, or Google won't be able to read the noindex instruction.
Common robots.txt Mistakes and How to Fix Them
Most robots.txt problems are easy to fix once you know what to look for.
Blocks all bots from everything
Problematic code
User-agent: * Disallow: /
How to fix
Change to "Disallow:" (empty) to allow all crawling, or specify only the paths you want to block.
Accidentally blocking a key page
Problematic code
Disallow: /important-blog-post/
How to fix
Audit your Disallow rules regularly — especially after site restructures. Use this checker to spot paths you may have missed.
Missing Sitemap declaration
Problematic code
(No Sitemap directive)
How to fix
Add "Sitemap: https://yourdomain.com/sitemap.xml" to help crawlers find all your pages faster.
Malformed lines without colons
Problematic code
Disallow /admin
How to fix
Every directive must follow "Directive: value" format, e.g. "Disallow: /admin/". Lines without a colon are ignored by most crawlers.
Directives without a User-agent header
Problematic code
(No User-agent group)
How to fix
Every rule group must begin with a User-agent line. Orphan Disallow or Allow rules may be ignored entirely.
Overly broad wildcard blocking query strings
Problematic code
Disallow: /*?
How to fix
Wildcard patterns like "/*?" block all URLs with query parameters, which can inadvertently block important paginated or filtered pages.
How This robots.txt Checker Works
What we check — and what each result means.
File existence
We fetch /robots.txt from your domain's root and confirm it returns HTTP 200.
Public accessibility
We verify the file is publicly accessible without authentication (no 401/403 responses).
Content-Type
We check that the file is served as text/plain, the expected MIME type for robots.txt.
User-agent presence
We detect whether User-agent rules are defined, including a catch-all wildcard (*).
Disallow/Allow rules
We parse and count all Disallow and Allow directives across all bot groups.
Sitemap declaration
We check for a Sitemap directive and list any declared sitemap URLs.
Important paths
We flag if commonly important paths (/blog, /products, /) are being blocked unintentionally.
Disallow all detection
We catch the critical mistake of blocking all bots from all pages (Disallow: /).
Broad wildcard rules
We flag overly broad wildcard patterns like /* or /? that may block too aggressively.
Malformed lines
We identify lines that don't follow valid robots.txt syntax (missing colons, unknown directives).
Orphan directives
We detect Disallow/Allow rules that appear before any User-agent declaration.
File preview
We display the first 1,500 characters of your actual robots.txt so you can inspect it directly.
Understanding Results
This check meets best practice. No action needed.
Recommended improvement. The file still works, but this is worth addressing.
Critical issue that should be fixed. This may harm your crawlability or SEO.
Frequently Asked Questions About robots.txt
What is a robots.txt file?+
Does robots.txt affect SEO?+
What is the difference between robots.txt and noindex?+
What does Disallow: / mean in robots.txt?+
What is User-agent in robots.txt?+
What is the Sitemap directive in robots.txt?+
Can robots.txt block AI crawlers like GPTBot?+
How do I test my robots.txt file?+
What is Crawl-delay in robots.txt?+
What is the maximum size for a robots.txt file?+
Does having no robots.txt hurt SEO?+
Check Another Domain
Run the robots.txt validator on any website — a competitor, a client site, or your own domain after making improvements.
Check your other technical SEO signals:
Want deeper AI visibility insights?
Start a free trial →