Module 032 Intermediate 13 min read

Meta Robots & X-Robots-Tag

noindex, nofollow, noarchive, nosnippet, max-snippet, max-image-preview. Meta robots tag vs HTTP header — when each is the right choice.

By SEO Mastery Editorial

The <meta name="robots"> tag and the X-Robots-Tag HTTP header are how you tell crawlers what to do after they fetch a page — index it or not, follow links or not, show a snippet or not, cache or not. They are directives, not hints. When set correctly, Google obeys.

TL;DR

  • noindex is the right tool to remove a URL from the index. Disallow in robots.txt blocks crawling but leaves URL listings in the index. Use noindex and keep the page crawlable so Google can read the directive.
  • X-Robots-Tag HTTP header lets you control non-HTML resources. PDFs, images, CSV downloads, video files cannot carry meta tags — but they can carry HTTP headers. Same syntax, different transport.
  • max-snippet, max-image-preview, and max-video-preview matter more in 2026 because they constrain how Google AI Overviews and Bing Copilot can summarize your content. Setting max-snippet:-1 is the implicit consent.

The mental model

Meta robots and X-Robots-Tag are the page’s instructions to the visiting librarian after the librarian has read the book. The book exists; the visit happened. The instructions say: “do not catalog this”, or “catalog it but do not show a preview”, or “do not list any links from this page”. The librarian honors these instructions for any well-behaved member of the protocol — Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, Applebot.

The <meta name="robots"> tag goes in the <head> of an HTML document. The X-Robots-Tag is the same syntax served as an HTTP response header — invisible in the rendered page but present in every response. They are functionally equivalent for HTML; only the header works for non-HTML files.

Deep dive: the 2026 reality

The full directive vocabulary supported by Google in 2026:

DirectiveEffect
index / noindexAllow or block index inclusion
follow / nofollowPass or block link equity from this page’s outbound links
noarchiveDo not show the cached link
nosnippetDo not show a text snippet or video preview
max-snippet:NLimit text snippet to N characters; -1 = no limit
max-image-preview:none|standard|largeLimit image preview size
max-video-preview:NLimit video preview to N seconds; -1 = no limit
notranslateDo not offer Google Translate
noimageindexDo not index images on this page
unavailable_after:DATEDrop from index after a specific date (RFC 850 or ISO 8601)
indexifembeddedIndex when embedded in a parent page (used with noindex on the embedded page)

Per-crawler targeting works by replacing robots with the specific bot name. Google’s documented tokens are googlebot, googlebot-news, googlebot-image, google-extended, otherbot (catch-all). Bing accepts bingbot. AI crawlers do not yet have widely-honored per-crawler meta directives — control them via robots.txt.

Google-Extended is unusual: it is a robots.txt token only, not a meta robots token. To opt out of Gemini training and AI Overviews input on a per-page basis, you cannot use <meta name="google-extended" content="noindex"> — that is not a recognized directive. Use robots.txt for sitewide control or live with the binary.

The 2026 reality on AI snippets: Google’s AI Overviews respect nosnippet and max-snippet. Setting max-snippet:0 removes your page from AI Overview citations. Setting max-snippet:-1 (or omitting the directive) is implicit consent to be summarized. PerplexityBot and OAI-SearchBot do not currently honor max-snippet — they read your full page anyway because they fetch on user query rather than pre-cache.

Visualizing it

flowchart TD
  A[Crawler fetches URL] --> B{HTTP response headers}
  B --> C{X-Robots-Tag present?}
  C -->|Yes, noindex| Z[Drop from index]
  C -->|Yes, other| D[Apply directives]
  C -->|No| E[Parse HTML head]
  E --> F{meta name=robots?}
  F -->|Yes, noindex| Z
  F -->|Yes, other| D
  F -->|No| G[Default: index, follow, max-snippet:-1]
  D --> H[Index with constraints]
  G --> H

Bad vs. expert

The bad approach

Two failure patterns. First, putting noindex in robots.txt (a now-removed nonstandard Google extension):

# robots.txt — DOES NOT WORK
User-agent: *
Noindex: /private/

Google removed support for Noindex: in robots.txt on September 1, 2019. It still appears in legacy configs and silently does nothing. The team thinks they have deindexed /private/; they have not.

Second, blocking /private/ in robots.txt and adding noindex to the page:

# robots.txt
User-agent: *
Disallow: /private/
<!-- on /private/something -->
<meta name="robots" content="noindex">

This is contradictory: Google cannot crawl the page (blocked) and therefore cannot read the noindex directive. The URL stays in the index — Google will display the URL with the message “A description for this result is not available because of this site’s robots.txt” — for as long as external links point to it.

Third, using noindex on paginated category pages (page 2, page 3, etc.):

<!-- on /blog?page=2 -->
<meta name="robots" content="noindex,follow">

Google’s John Mueller confirmed in 2017 (and again in 2024) that noindex,follow long-term degrades to noindex,nofollow. Google reasonably concludes a permanently noindexed page is a low-value source of link signal. Use self-canonicals for pagination, not noindex.

The expert approach

For a page you want deindexed, serve noindex via meta tag (HTML pages) or X-Robots-Tag header (non-HTML or universal):

<!-- HTML page deindexing -->
<head>
  <meta name="robots" content="noindex,nofollow">
</head>

For PDFs, downloads, or CSV files, set the header server-side. Nginx:

location ~* \.(pdf|csv|xls|xlsx)$ {
  add_header X-Robots-Tag "noindex, nosnippet" always;
  try_files $uri =404;
}

# Or for a specific path
location /internal/ {
  add_header X-Robots-Tag "noindex, nofollow" always;
}

For granular control, set max-snippet, max-image-preview, and max-video-preview:

<!-- Allow text snippet up to 160 chars, large image previews -->
<meta name="robots" content="max-snippet:160, max-image-preview:large, max-video-preview:-1">

Per-crawler differentiation — block Google News from indexing while allowing Google Search:

<meta name="googlebot-news" content="noindex">
<meta name="googlebot" content="index, follow">

The unavailable_after directive for time-sensitive content (limited promotions, expiring events):

<!-- Drop this URL from the index after the date passes -->
<meta name="robots" content="unavailable_after: 2026-12-31T23:59:59Z">

For AI surface control on individual pages, combine max-snippet:0 with allowing crawl:

<!-- Page is indexable but cannot be summarized in AI Overviews -->
<meta name="robots" content="index, follow, max-snippet:0, noarchive">

To deindex a category sitewide, the X-Robots-Tag at the response level is cleaner than touching every template:

location /staff-only/ {
  add_header X-Robots-Tag "noindex, nofollow" always;
  proxy_pass http://upstream;
}

Verify the header is actually present:

curl -I https://example.com/staff-only/dashboard \
  | grep -i x-robots-tag
# Expected: X-Robots-Tag: noindex, nofollow

Do this today

  1. Audit all current noindex directives. In Screaming Frog SEO Spider, filter Indexability > Non-Indexable and review every URL. Confirm each one should be noindexed; mistakes here are common.
  2. Search your codebase for name="robots" and X-Robots-Tag. Catalog every place a directive is set. Decentralized robots logic is the #1 cause of accidental sitewide deindexation.
  3. For each URL marked noindex, verify it is not also blocked in robots.txt. Use GSC’s robots.txt Tester (under Settings > Crawling) to confirm. If both are set, lift the robots block first so the noindex can be processed.
  4. Inspect HTTP headers for non-HTML downloads. curl -I your top 10 PDFs, image assets, and CSV files. If they should not be indexed, add X-Robots-Tag: noindex at the server level.
  5. In GSC > URL Inspection, run the live test on a noindexed URL. Confirm Indexing allowed? says No: ‘noindex’ detected in ‘robots’ meta tag (or the header equivalent). If it says Yes, your directive is not being served.
  6. Set max-image-preview:large on every public content page. This unlocks larger image previews in Google Discover and AI Overviews — typically a 10–20% CTR lift on Discover-eligible content.
  7. Audit per-crawler directives. Search for googlebot-news, google-extended, and any custom user-agent meta tags. Document the editorial rationale for each.
  8. Add a CI test that fetches your homepage and key templates, parses headers + meta robots, and asserts index, follow (or your intended values). Catch regressions before they ship.
  9. For URLs you want fully removed from the index quickly, use GSC > Removals > New Request > Temporarily remove URL after serving noindex. The temporary removal hides the URL for ~6 months while Google’s recrawl picks up the permanent directive.

Mark complete

Toggle to remember this module as mastered. Saved to your browser only.

More in this part

Part 5: Technical SEO

View all on the home page →
  1. 026 Technical SEO Fundamentals 12m
  2. 027 Site Architecture 20m
  3. 028 Crawling & Indexing 17m
  4. 029 robots.txt Deep Dive 15m
  5. 030 XML Sitemaps 12m
  6. 031 Canonical Tags 20m
  7. 032 Meta Robots & X-Robots-Tag You're here 13m
  8. 033 HTTP Status Codes 15m
  9. 034 Crawl Budget Management 16m
  10. 035 JavaScript SEO 26m
  11. 036 Core Web Vitals 17m
  12. 037 Site Speed & Performance 19m
  13. 038 HTTPS & Site Security 12m
  14. 039 Mobile SEO & Mobile-First Indexing 14m
  15. 040 Structured Data & Schema Markup 17m
  16. 041 International SEO (hreflang) 19m
  17. 042 Pagination 12m
  18. 043 Faceted Navigation 26m
  19. 044 Duplicate Content 13m
  20. 045 Site Migrations 24m