Free Tool — No Signup Required

robots.txt Checker & Validator

The only free robots.txt checker that shows what you've blocked for ChatGPT, Perplexity, and Claude — alongside full directive parsing, crawlability analysis, and plain-English SEO recommendations. No signup required.

Supports example.com, https://example.com, or https://www.example.com

No account needed Checks GPTBot, ClaudeBot & PerplexityBot 10+ validation checks Full directive parsing
Background

What is a robots.txt file?

A simple but critical file that tells search engines and bots how to crawl your website.

A robots.txt file is a plain-text file placed at the root of your website (e.g. https://example.com/robots.txt) that follows the Robots Exclusion Protocol — a decades-old web standard.

It allows site owners to control which pages search engines like Google, Bing, and others can crawl. You can allow everything, block specific sections (like admin pages or duplicate content), or restrict access to certain bots entirely.

It's one of the first files most crawlers look at when they visit your site — making it a foundational piece of technical SEO configuration.

Example robots.txt

User-agent: *
Disallow: /admin/
Disallow: /checkout/
Allow: /blog/

User-agent: GPTBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

Key terms explained

User-agentWhich bot this rule applies to (* = all bots)
DisallowPaths the bot should NOT crawl
AllowPaths the bot IS allowed to crawl (overrides Disallow)
SitemapURL of your XML sitemap for faster discovery
SEO Impact

Why robots.txt Matters for Technical SEO

🕷️

Controls Crawl Budget

Search engines allocate a limited number of crawl requests per site. By blocking pages that don't need to be indexed (like admin dashboards, duplicate pages, or internal search results), you help search engines focus on what actually matters.

🗺️

Guides Bot Discovery

The Sitemap directive in robots.txt points crawlers directly to your XML sitemap, accelerating discovery of your most important pages. It's a small addition that can meaningfully improve crawl efficiency.

🤖

Controls AI Crawlers

AI companies increasingly crawl the web to train their models. robots.txt lets you opt specific AI bots out of accessing your content — useful if you have concerns about your content being used for AI training without your consent. For fine-grained AI access control, also check your llms.txt file — it lets you signal intent to AI systems beyond what robots.txt covers.

⚠️

The most dangerous robots.txt mistake

Disallow: / under User-agent: * blocks ALL search engines from crawling your entire website. This single configuration error can cause your site to disappear from Google almost entirely. It's surprisingly common — especially after site migrations, CMS updates, or when developers forget to remove a staging-environment block before launch. Our checker flags this immediately.

robots.txt controls crawling — not indexing

An important nuance: blocking a page in robots.txt prevents crawlers from visiting it, but doesn't guarantee that page won't appear in search results. If the page is linked from other sites, Google may still list it with a "No information available" snippet. To fully prevent a page from appearing in search results, use the noindex meta tag — and don't block that page in robots.txt, or Google won't be able to read the noindex instruction.

Common Pitfalls

Common robots.txt Mistakes and How to Fix Them

Most robots.txt problems are easy to fix once you know what to look for.

critical

Blocks all bots from everything

Problematic code

User-agent: *
Disallow: /

How to fix

Change to "Disallow:" (empty) to allow all crawling, or specify only the paths you want to block.

high

Accidentally blocking a key page

Problematic code

Disallow: /important-blog-post/

How to fix

Audit your Disallow rules regularly — especially after site restructures. Use this checker to spot paths you may have missed.

medium

Missing Sitemap declaration

Problematic code

(No Sitemap directive)

How to fix

Add "Sitemap: https://yourdomain.com/sitemap.xml" to help crawlers find all your pages faster.

medium

Malformed lines without colons

Problematic code

Disallow
/admin

How to fix

Every directive must follow "Directive: value" format, e.g. "Disallow: /admin/". Lines without a colon are ignored by most crawlers.

medium

Directives without a User-agent header

Problematic code

(No User-agent group)

How to fix

Every rule group must begin with a User-agent line. Orphan Disallow or Allow rules may be ignored entirely.

medium

Overly broad wildcard blocking query strings

Problematic code

Disallow: /*?

How to fix

Wildcard patterns like "/*?" block all URLs with query parameters, which can inadvertently block important paginated or filtered pages.

About This Tool

How This robots.txt Checker Works

What we check — and what each result means.

🌐

File existence

We fetch /robots.txt from your domain's root and confirm it returns HTTP 200.

🔒

Public accessibility

We verify the file is publicly accessible without authentication (no 401/403 responses).

📄

Content-Type

We check that the file is served as text/plain, the expected MIME type for robots.txt.

👤

User-agent presence

We detect whether User-agent rules are defined, including a catch-all wildcard (*).

🚦

Disallow/Allow rules

We parse and count all Disallow and Allow directives across all bot groups.

🗺️

Sitemap declaration

We check for a Sitemap directive and list any declared sitemap URLs.

🔍

Important paths

We flag if commonly important paths (/blog, /products, /) are being blocked unintentionally.

⚠️

Disallow all detection

We catch the critical mistake of blocking all bots from all pages (Disallow: /).

🌀

Broad wildcard rules

We flag overly broad wildcard patterns like /* or /? that may block too aggressively.

🔧

Malformed lines

We identify lines that don't follow valid robots.txt syntax (missing colons, unknown directives).

🤖

Orphan directives

We detect Disallow/Allow rules that appear before any User-agent declaration.

👁️

File preview

We display the first 1,500 characters of your actual robots.txt so you can inspect it directly.

Understanding Results

Pass

This check meets best practice. No action needed.

Warning

Recommended improvement. The file still works, but this is worth addressing.

Fail

Critical issue that should be fixed. This may harm your crawlability or SEO.

FAQ

Frequently Asked Questions About robots.txt

What is a robots.txt file?+
A robots.txt file is a plain-text file placed at the root of a website (e.g., https://example.com/robots.txt) that tells search engine crawlers and other bots which pages or sections of the site they are allowed or not allowed to access. It follows the Robots Exclusion Protocol, a widely adopted web standard. Think of it as a set of directions you leave for bots before they enter your site.
Does robots.txt affect SEO?+
Yes, robots.txt has a direct impact on what search engines can crawl — and therefore, what they can potentially index and rank. If you accidentally block critical pages with robots.txt, those pages will not appear in search results, regardless of how good their content is. However, robots.txt controls crawling, not indexing: a page can still appear in search results if it's linked from elsewhere, even if robots.txt blocks it from being crawled. For full control over indexing, use the noindex meta tag.
What is the difference between robots.txt and noindex?+
robots.txt tells crawlers not to visit a page. The noindex meta tag tells crawlers they can visit the page but should not include it in search results. These serve different purposes: users robots.txt to save crawl budget and prevent access to private pages; use noindex to let crawlers access content but keep it out of search results. Blocking a page in robots.txt and adding noindex is redundant — crawlers cannot read the noindex tag if they're blocked from accessing the page.
What does Disallow: / mean in robots.txt?+
"Disallow: /" means that all paths on the website are blocked. When combined with "User-agent: *" (which targets all bots), this effectively prevents every search engine and crawler from accessing any page on your site. This is one of the most common and damaging robots.txt mistakes — it completely removes your site from search engine indexes. Only use "Disallow: /" for specific bots you intentionally want to block, never under "User-agent: *" unless you want your site to disappear from search.
What is User-agent in robots.txt?+
The User-agent directive specifies which bot or crawler the following rules apply to. "User-agent: *" targets all bots, while "User-agent: Googlebot" applies rules only to Google's crawler. You can have multiple User-agent groups in a single robots.txt file, each with their own set of Disallow and Allow rules. Rules are applied per-group, so different bots can receive different instructions.
What is the Sitemap directive in robots.txt?+
The Sitemap directive in robots.txt tells crawlers where to find your XML sitemap — for example: "Sitemap: https://example.com/sitemap.xml". This is one of the most effective ways to help search engines discover all your important pages quickly. Including a Sitemap line in your robots.txt is a widely recommended SEO best practice, even if you've also submitted your sitemap directly in Google Search Console.
Can robots.txt block AI crawlers like GPTBot?+
Yes. You can use robots.txt to instruct specific AI crawlers to stay off your site. For example, "User-agent: GPTBot" followed by "Disallow: /" will ask OpenAI's crawler to avoid your content. Other AI bot user-agents include CCBot (Common Crawl), Google-Extended (Google AI training data), and PerplexityBot. Note that these bots are expected to respect robots.txt, but compliance is voluntary — there is no technical enforcement mechanism.
How do I test my robots.txt file?+
You can test your robots.txt file using tools like this one (enter your domain above), or by using Google Search Console's robots.txt tester, which also lets you test how specific URLs are treated by your current rules. Directly accessing https://yourdomain.com/robots.txt in a browser is the quickest way to confirm the file exists and see its contents.
What is Crawl-delay in robots.txt?+
"Crawl-delay" is an optional directive that tells crawlers how many seconds to wait between requests. For example, "Crawl-delay: 10" asks bots to wait 10 seconds between each page fetch. It's intended to reduce server load caused by aggressive crawling. Note that Google does not officially support the Crawl-delay directive — use Google Search Console's crawl rate settings to manage Googlebot's speed instead.
What is the maximum size for a robots.txt file?+
Google's crawlers will read the first 500 kibibytes (approximately 500 KB) of a robots.txt file and ignore anything beyond that. Most sites will never come close to this limit — a well-structured robots.txt is typically a few dozen lines. If your file is growing very large, it's worth auditing whether all those rules are still needed.
Does having no robots.txt hurt SEO?+
Not having a robots.txt file does not directly harm your SEO. Without a robots.txt file, well-behaved crawlers will assume they can access everything. However, without one you miss the opportunity to declare your sitemap location, block private areas of your site, and control crawl budget for large websites. Adding a properly configured robots.txt is a technical SEO best practice that costs little effort but provides real benefits.

Check Another Domain

Run the robots.txt validator on any website — a competitor, a client site, or your own domain after making improvements.

← All Free Tools

Check your other technical SEO signals:

Want deeper AI visibility insights?

Start a free trial →