Meta Robots & X-Robots-Tag
noindex, nofollow, noarchive, nosnippet, max-snippet, max-image-preview. Meta robots tag vs HTTP header — when each is the right choice.
The <meta name="robots"> tag and the X-Robots-Tag HTTP header are how you tell crawlers what to do after they fetch a page — index it or not, follow links or not, show a snippet or not, cache or not. They are directives, not hints. When set correctly, Google obeys.
TL;DR
noindexis the right tool to remove a URL from the index.Disallowin robots.txt blocks crawling but leaves URL listings in the index. Usenoindexand keep the page crawlable so Google can read the directive.X-Robots-TagHTTP header lets you control non-HTML resources. PDFs, images, CSV downloads, video files cannot carry meta tags — but they can carry HTTP headers. Same syntax, different transport.max-snippet,max-image-preview, andmax-video-previewmatter more in 2026 because they constrain how Google AI Overviews and Bing Copilot can summarize your content. Settingmax-snippet:-1is the implicit consent.
The mental model
Meta robots and X-Robots-Tag are the page’s instructions to the visiting librarian after the librarian has read the book. The book exists; the visit happened. The instructions say: “do not catalog this”, or “catalog it but do not show a preview”, or “do not list any links from this page”. The librarian honors these instructions for any well-behaved member of the protocol — Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, Applebot.
The <meta name="robots"> tag goes in the <head> of an HTML document. The X-Robots-Tag is the same syntax served as an HTTP response header — invisible in the rendered page but present in every response. They are functionally equivalent for HTML; only the header works for non-HTML files.
Deep dive: the 2026 reality
The full directive vocabulary supported by Google in 2026:
| Directive | Effect |
|---|---|
index / noindex | Allow or block index inclusion |
follow / nofollow | Pass or block link equity from this page’s outbound links |
noarchive | Do not show the cached link |
nosnippet | Do not show a text snippet or video preview |
max-snippet:N | Limit text snippet to N characters; -1 = no limit |
max-image-preview:none|standard|large | Limit image preview size |
max-video-preview:N | Limit video preview to N seconds; -1 = no limit |
notranslate | Do not offer Google Translate |
noimageindex | Do not index images on this page |
unavailable_after:DATE | Drop from index after a specific date (RFC 850 or ISO 8601) |
indexifembedded | Index when embedded in a parent page (used with noindex on the embedded page) |
Per-crawler targeting works by replacing robots with the specific bot name. Google’s documented tokens are googlebot, googlebot-news, googlebot-image, google-extended, otherbot (catch-all). Bing accepts bingbot. AI crawlers do not yet have widely-honored per-crawler meta directives — control them via robots.txt.
Google-Extended is unusual: it is a robots.txt token only, not a meta robots token. To opt out of Gemini training and AI Overviews input on a per-page basis, you cannot use <meta name="google-extended" content="noindex"> — that is not a recognized directive. Use robots.txt for sitewide control or live with the binary.
The 2026 reality on AI snippets: Google’s AI Overviews respect nosnippet and max-snippet. Setting max-snippet:0 removes your page from AI Overview citations. Setting max-snippet:-1 (or omitting the directive) is implicit consent to be summarized. PerplexityBot and OAI-SearchBot do not currently honor max-snippet — they read your full page anyway because they fetch on user query rather than pre-cache.
Visualizing it
flowchart TD
A[Crawler fetches URL] --> B{HTTP response headers}
B --> C{X-Robots-Tag present?}
C -->|Yes, noindex| Z[Drop from index]
C -->|Yes, other| D[Apply directives]
C -->|No| E[Parse HTML head]
E --> F{meta name=robots?}
F -->|Yes, noindex| Z
F -->|Yes, other| D
F -->|No| G[Default: index, follow, max-snippet:-1]
D --> H[Index with constraints]
G --> H
Bad vs. expert
The bad approach
Two failure patterns. First, putting noindex in robots.txt (a now-removed nonstandard Google extension):
# robots.txt — DOES NOT WORK
User-agent: *
Noindex: /private/
Google removed support for Noindex: in robots.txt on September 1, 2019. It still appears in legacy configs and silently does nothing. The team thinks they have deindexed /private/; they have not.
Second, blocking /private/ in robots.txt and adding noindex to the page:
# robots.txt
User-agent: *
Disallow: /private/
<!-- on /private/something -->
<meta name="robots" content="noindex">
This is contradictory: Google cannot crawl the page (blocked) and therefore cannot read the noindex directive. The URL stays in the index — Google will display the URL with the message “A description for this result is not available because of this site’s robots.txt” — for as long as external links point to it.
Third, using noindex on paginated category pages (page 2, page 3, etc.):
<!-- on /blog?page=2 -->
<meta name="robots" content="noindex,follow">
Google’s John Mueller confirmed in 2017 (and again in 2024) that noindex,follow long-term degrades to noindex,nofollow. Google reasonably concludes a permanently noindexed page is a low-value source of link signal. Use self-canonicals for pagination, not noindex.
The expert approach
For a page you want deindexed, serve noindex via meta tag (HTML pages) or X-Robots-Tag header (non-HTML or universal):
<!-- HTML page deindexing -->
<head>
<meta name="robots" content="noindex,nofollow">
</head>
For PDFs, downloads, or CSV files, set the header server-side. Nginx:
location ~* \.(pdf|csv|xls|xlsx)$ {
add_header X-Robots-Tag "noindex, nosnippet" always;
try_files $uri =404;
}
# Or for a specific path
location /internal/ {
add_header X-Robots-Tag "noindex, nofollow" always;
}
For granular control, set max-snippet, max-image-preview, and max-video-preview:
<!-- Allow text snippet up to 160 chars, large image previews -->
<meta name="robots" content="max-snippet:160, max-image-preview:large, max-video-preview:-1">
Per-crawler differentiation — block Google News from indexing while allowing Google Search:
<meta name="googlebot-news" content="noindex">
<meta name="googlebot" content="index, follow">
The unavailable_after directive for time-sensitive content (limited promotions, expiring events):
<!-- Drop this URL from the index after the date passes -->
<meta name="robots" content="unavailable_after: 2026-12-31T23:59:59Z">
For AI surface control on individual pages, combine max-snippet:0 with allowing crawl:
<!-- Page is indexable but cannot be summarized in AI Overviews -->
<meta name="robots" content="index, follow, max-snippet:0, noarchive">
To deindex a category sitewide, the X-Robots-Tag at the response level is cleaner than touching every template:
location /staff-only/ {
add_header X-Robots-Tag "noindex, nofollow" always;
proxy_pass http://upstream;
}
Verify the header is actually present:
curl -I https://example.com/staff-only/dashboard \
| grep -i x-robots-tag
# Expected: X-Robots-Tag: noindex, nofollow
Do this today
- Audit all current
noindexdirectives. In Screaming Frog SEO Spider, filter Indexability > Non-Indexable and review every URL. Confirm each one should be noindexed; mistakes here are common. - Search your codebase for
name="robots"andX-Robots-Tag. Catalog every place a directive is set. Decentralized robots logic is the #1 cause of accidental sitewide deindexation. - For each URL marked
noindex, verify it is not also blocked inrobots.txt. Use GSC’s robots.txt Tester (under Settings > Crawling) to confirm. If both are set, lift the robots block first so the noindex can be processed. - Inspect HTTP headers for non-HTML downloads.
curl -Iyour top 10 PDFs, image assets, and CSV files. If they should not be indexed, addX-Robots-Tag: noindexat the server level. - In GSC > URL Inspection, run the live test on a noindexed URL. Confirm Indexing allowed? says No: ‘noindex’ detected in ‘robots’ meta tag (or the header equivalent). If it says Yes, your directive is not being served.
- Set
max-image-preview:largeon every public content page. This unlocks larger image previews in Google Discover and AI Overviews — typically a 10–20% CTR lift on Discover-eligible content. - Audit per-crawler directives. Search for
googlebot-news,google-extended, and any custom user-agent meta tags. Document the editorial rationale for each. - Add a CI test that fetches your homepage and key templates, parses headers + meta robots, and asserts
index, follow(or your intended values). Catch regressions before they ship. - For URLs you want fully removed from the index quickly, use GSC > Removals > New Request > Temporarily remove URL after serving
noindex. The temporary removal hides the URL for ~6 months while Google’s recrawl picks up the permanent directive.
Mark complete
Toggle to remember this module as mastered. Saved to your browser only.
More in this part
Part 5: Technical SEO
- 026 Technical SEO Fundamentals 12m
- 027 Site Architecture 20m
- 028 Crawling & Indexing 17m
- 029 robots.txt Deep Dive 15m
- 030 XML Sitemaps 12m
- 031 Canonical Tags 20m
- 032 Meta Robots & X-Robots-Tag You're here 13m
- 033 HTTP Status Codes 15m
- 034 Crawl Budget Management 16m
- 035 JavaScript SEO 26m
- 036 Core Web Vitals 17m
- 037 Site Speed & Performance 19m
- 038 HTTPS & Site Security 12m
- 039 Mobile SEO & Mobile-First Indexing 14m
- 040 Structured Data & Schema Markup 17m
- 041 International SEO (hreflang) 19m
- 042 Pagination 12m
- 043 Faceted Navigation 26m
- 044 Duplicate Content 13m
- 045 Site Migrations 24m