IndexDoctor.io
AI visibility

ClaudeBot blocked by robots.txt

Anthropic runs more than one crawler. Blocking ClaudeBot is an opt-out from training, not from live Claude citations.

What this usually means

Your robots.txt disallows ClaudeBot, either explicitly or via a catch-all. ClaudeBot honors robots.txt, so Anthropic's training pipeline will not fetch the URLs it matches. The block may also be incidentally blocking Claude-SearchBot and Claude-User, depending on how the rules are written.

Why it matters

Anthropic separates its crawlers by purpose. ClaudeBot is used for training. Claude-SearchBot powers retrieval when Claude answers questions with web context. Claude-User is a per-request user-triggered fetcher. A broad block can unintentionally cover all three, which removes Claude's ability to cite you at all, not just to train on you.

Common causes
  • A blanket User-agent: * Disallow: / rule includes every Claude agent.
  • An "AI block" template lists ClaudeBot without also distinguishing Claude-SearchBot or Claude-User.
  • A CDN rule blocks anything matching /claude/i at the edge, including user-triggered fetches.
  • Staging robots.txt leaked to production with a global Disallow.
  • The block is intentional but not documented, so new pages inherit it silently.
How to diagnose it
  1. Open AI Crawler Checker.
  2. Inspect the rows for ClaudeBot, Claude-SearchBot, and Claude-User separately.
  3. Note which robots.txt group each one matched and whether the rule was User-agent specific or catch-all.
  4. Confirm the page itself is 200 with real server-rendered content, so allowed Claude crawlers would actually have something to parse.
How to fix it
  1. 1

    List Anthropic crawlers explicitly

    Add separate User-agent groups for ClaudeBot, Claude-SearchBot, and Claude-User. Use the one policy that matches your intent for each.

  2. 2

    Separate training from retrieval

    To opt out of training only, disallow ClaudeBot and allow Claude-SearchBot and Claude-User. Claude can then cite you without training on you.

  3. 3

    Check the edge stack

    Edge rules that block by regex can catch more than you want. Audit your CDN, WAF, and bot-management configs.

  4. 4

    Confirm with AI Crawler Checker

    After changes, re-run the checker. The matrix should show allowed/disallowed exactly as you intended for each Anthropic agent.

FAQ
Which Claude crawler should I allow?

If you want Claude to cite your content, allow Claude-SearchBot and Claude-User. ClaudeBot is specifically for training, allowing it is a separate question about whether your content is used to improve future models.

Does robots.txt guarantee privacy?

No. robots.txt is a politeness signal that Anthropic's crawlers honor, but it is not a privacy mechanism. Content that needs to be private should be behind authentication, not just disallowed in robots.txt.

Can I block training but allow search?

Yes. Disallow ClaudeBot and allow Claude-SearchBot. That lets Claude cite your pages in answers without using them to train future models.

Related fixes

Ready to diagnose your URL?

AI Crawler Checker runs the exact checks discussed above.

Run AI Crawler Checker