IndexDoctor.io
Indexing · Guide

Indexability checklist before publishing a page

A short, boring list you can run through before asking Google to index a page. It catches most of the avoidable indexing bugs.

Updated

Most indexing problems are avoidable. Staging robots.txt leaking into production, a canonical pointing at the wrong URL, a CDN adding an unexpected X-Robots-Tag header, these are all failure modes that show up weeks later as "traffic dropped." Running a short checklist before and after you publish catches the majority of them while they are still cheap to fix.

Before publishing

Do not ask Google to index a page until these are true.

  1. Page URL returns HTTP 200 to a fresh, unauthenticated request.
  2. robots.txt allows Googlebot to fetch the URL.
  3. No noindex directive in HTML meta robots or the X-Robots-Tag HTTP header.
  4. Canonical points at the final, indexable URL (self-referencing is the safe default).
  5. Redirect chain from common variants (http, non-www, trailing slash) is at most one hop to the final URL.
  6. Title and meta description are specific to this page, not template defaults.
  7. og:title, og:description, og:image (1200x630), og:url are set and reachable.
  8. URL is included in your sitemap.

Sitemap and robots basics

  • Sitemap returns HTTP 200 and real XML (not HTML).
  • Sitemap URL is declared in robots.txt.
  • Sitemap URL is submitted in Search Console.
  • Sitemap index children are reachable if you use a nested sitemap.

Common traps

  • Staging robots.txt with Disallow: / was promoted to production.
  • SEO plugin or framework default marks a post type or layout noindex.
  • CDN injects X-Robots-Tag: noindex across a whole path prefix.
  • Canonical is generated from an internal slug and ignores redirects, slashes, or protocol.
  • Sitemap still points at old URLs after a migration.

After publishing

  1. In Search Console, run URL Inspection on the published URL.
  2. Confirm the page is eligible for indexing, then request indexing.
  3. Re-run the pre-publish checklist a few days later to catch drift.
  4. Watch coverage reports in Search Console for the next 2 to 3 weeks.

When to retest

  • After any deploy that touches the SEO metadata pipeline.
  • After any change to robots.txt, sitemap generation, or canonical rules.
  • After any CDN, WAF, or edge-rule change.
  • After moving the site to a new domain, scheme, or host.

Tools to use

Each item in the checklist maps to a specific IndexDoctor tool. Sitemap Checker for sitemap health, Robots Tester for robots.txt, Noindex Checker for meta robots and headers, Canonical Checker for canonical signals, Redirect Chain Checker for URL variants, HTTP Header Checker for response headers, OG Preview for social previews.

FAQ
What should I check first?

Check that the page returns HTTP 200 to Googlebot, is not noindex, and is not blocked in robots.txt. Those three cover the majority of indexing bugs.

Can a page be crawlable but not indexable?

Yes. A page can return 200, be allowed in robots.txt, and still be noindex, have a canonical that points elsewhere, or have an X-Robots-Tag: noindex header. Crawlability and indexability are different gates.

How often should I retest?

Run the checklist on new pages before and a few days after launch. Rerun on critical pages whenever you deploy changes to sitemaps, robots.txt, auth rules, CDN, or the SEO metadata pipeline.

What is the difference between robots.txt and noindex?

robots.txt controls whether Googlebot is allowed to crawl a URL. noindex tells Google not to include the URL in its search index. A URL blocked by robots.txt cannot be reliably noindexed, because Google may never see the noindex directive.

Do I need a sitemap if my site is small?

Not strictly, but it is cheap to add and it makes discovery faster. For most sites, the marginal cost of shipping a sitemap is tiny compared to the risk of missing pages.

Related tools
Related fixes
More guides