Faceted Navigation
The classic crawl-budget killer. The index/noindex/block decision matrix, AJAX vs URL filtering, parameter handling, and a strategy for multi-facet combinations.
Faceted navigation is the single largest crawl-budget destroyer on the modern web. A category page with 8 filters and 4 values per filter generates 65,536 URL combinations; multiply by 1,000 categories and you have 65 million URLs Googlebot may attempt to crawl, of which maybe 200 actually deserve to rank. Get the strategy right and faceted pages become a long-tail traffic engine; get it wrong and they consume the crawl budget the rest of your site needs.
TL;DR
- Most facets should be blocked from crawling, not noindexed.
noindexstill consumes crawl budget;Disallowin robots.txt or non-crawlable AJAX prevents the request entirely. - Some facets — usually one or two per category — earn the right to be indexed. “Red running shoes,” “vegan leather sofas,” “size 10 hiking boots.” Map demand before deciding.
- The Search Console URL Parameters tool was deprecated in April 2022. Parameter handling is now your job, expressed via robots.txt, canonicals, and on-page directives.
The mental model
Faceted navigation is like the index of a department store. The store has aisles (categories) and labels (facets) — color, size, brand, price. A shopper can filter aisles by any combination of labels, but not every combination deserves its own permanent sign at the front entrance. “Men’s red Nike running shoes size 10” is rare enough that nobody walks in asking for it; “men’s running shoes” is general enough that everyone does.
Your job is to decide which combinations get a permanent sign (indexable, in the sitemap, internally linked) and which combinations exist only as live filters during a shopping session. Get it wrong in the “permissive” direction and you flood the index with thin variants that compete with each other. Get it wrong in the “restrictive” direction and you forfeit long-tail traffic to competitors who indexed “vegan leather sofa” before you did.
The third dimension — and the one most teams forget — is crawl economy. Even if you noindex 99% of facet URLs, Googlebot still has to crawl them to discover the noindex. On a 10-million-URL site, that’s a real budget. Blocking at the URL level (robots.txt, AJAX, hash fragments) saves the crawl entirely.
Deep dive: the 2026 reality
The four canonical decisions per facet:
| Decision | Method | Use when |
|---|---|---|
| Index | Crawlable URL, self-canonical, in sitemap | Facet has search demand, business value, sufficient inventory |
| Noindex, allow crawl | Crawlable URL, noindex,follow, no canonical to base | Facet has no demand but you want the linked products discoverable |
| Block crawl | Disallow in robots.txt, no canonical pollution | Long-tail filter combinations with no demand |
| Don’t generate URL | AJAX in-place, no pushState | Sort orders, view toggles, ephemeral state |
The end of the URL Parameters tool. Google retired the GSC URL Parameters tool on April 26, 2022 because, in their words, “Google has gotten significantly better at figuring out which URL variations are useful.” That sentence is mostly true and partly Google deflecting the responsibility back to publishers. In practice, you cannot rely on Google to figure out which parameters matter — you have to express it through canonical, robots, and link strategy directly.
Parameter ordering and case-sensitivity. ?color=red&size=10 and ?size=10&color=red are technically different URLs to a crawler. Most CMSs normalize internally but expose both via internal links. Standardize parameter order in your canonical and in your link generation.
The decision matrix for a typical e-commerce category:
| Facet | Indexable? | Why |
|---|---|---|
| Category (top level) | Yes | Primary ranking target |
| Brand | Yes (if branded queries common) | “Nike running shoes” earns its own page |
| Color | Maybe — only “color + category" | "Red running shoes” yes; standalone “red” no |
| Size | No | No commercial intent for size-only |
| Price range | No | Highly variable, low intent |
| Material | Maybe — depends on category | ”Leather sofa” yes; “polyester pillow” no |
| Sort order | No | Always block; pure UX |
| View mode (grid/list) | No | Always AJAX |
| In stock toggle | No | Always AJAX or robots-blocked |
| Multi-facet combos | Almost never | Combinatorial explosion |
The defensible rule: index a facet only when standalone monthly search demand exceeds some threshold (commonly 100 searches/month for the singular facet, 500 for the combination). Use Ahrefs, Semrush, or Google Keyword Planner to validate before adding to the indexable set.
AJAX vs URL-based filtering — the architectural decision. If you control the front-end:
// AJAX-only filtering for non-indexable facets — never updates the URL
// Sort orders, view toggle, in-stock toggle
function applyFilter(filter) {
fetch(`/api/products?${buildQuery({ ...currentState, ...filter })}`)
.then(r => r.json())
.then(renderProducts);
// Note: NO history.pushState — URL stays clean
}
// URL-based filtering for indexable facets — updates URL and is crawlable
function applyIndexableFacet(facet) {
const url = new URL(location.href);
url.searchParams.set(facet.key, facet.value);
// Order params alphabetically for canonical consistency
const sorted = new URLSearchParams([...url.searchParams.entries()].sort());
history.pushState({}, "", `${url.pathname}?${sorted}`);
fetch(`${url.pathname}?${sorted}`)
.then(r => r.text())
.then(updateGrid);
}
The first pattern keeps state in JavaScript only; the URL never changes; Googlebot never sees the variant. The second pattern creates a real URL that Googlebot can crawl, with consistent parameter ordering so the canonical works.
The hash-fragment escape hatch. Putting filter state after # (/category#filter=red) keeps Googlebot away because hashes are never sent to servers — it’s a clean way to handle ephemeral filtering when AJAX-only isn’t an option. The trade-off: deep linking and shareability are weaker.
Multi-facet combinations need their own strategy. Even if you index /shoes/running/ and /shoes/red/, you should usually not index /shoes/running/red/. The combinatorial explosion is the enemy: 5 brands × 6 colors × 8 sizes × 4 price ranges = 960 URL combinations per category. Pick the single most valuable combination per category if any, and block the rest.
The 2026 AI-search wrinkle: GPTBot, ClaudeBot, and PerplexityBot all respect robots.txt. Blocking facet combinations in robots.txt also blocks AI crawlers from wasting budget on them. This is desirable — you want AI systems to cite your canonical category page, not a thin facet variant.
Visualizing it
flowchart TD
A[User clicks facet] --> B{Facet has search demand?}
B -->|No - sort, view, in-stock| C[AJAX-only, no URL change]
B -->|Yes - color or brand| D{Standalone or combo?}
D -->|Standalone with demand| E[Indexable URL, self-canonical]
D -->|Combo with demand| F[Maybe indexable, evaluate per category]
D -->|No verified demand| G[Allow crawl but noindex,follow]
C --> H[Googlebot never sees variant]
E --> I[Crawled, indexed, ranks for facet query]
F --> I
G --> J[Crawled, not indexed, products discoverable]
K[Combinatorial garbage] --> L[Blocked in robots.txt, never crawled]
Bad vs. expert
The bad approach
<!-- Filter UI generates URLs like this on every click -->
<a href="/shoes/?color=Red&size=10&brand=Nike&sort=price-asc&view=grid&page=1">Red size 10 Nike</a>
<a href="/shoes/?Color=red&Size=10&Brand=NIKE&sort=price-asc">Red size 10 Nike caps</a>
<a href="/shoes/?color=red&brand=nike&size=10&sort=price-asc&view=grid&page=1">Same products, different order</a>
# robots.txt — wide open
User-agent: *
Allow: /
<!-- Every facet page emits its own self-canonical -->
<link rel="canonical" href="/shoes/?color=Red&size=10&brand=Nike&sort=price-asc&view=grid&page=1" />
Three identical product sets generate three different URLs because of inconsistent parameter case, ordering, and inclusion of UX-only params (sort, view, page). Self-canonical on every variant means Google indexes all three. Robots.txt is wide open, so Googlebot crawls every combination it discovers — easily 100,000 URLs per category. The crawl budget collapses; only a fraction of legitimately ranking pages get re-crawled per month; rankings for the actual category drift downward as fresh content elsewhere outpaces yours.
The expert approach
# robots.txt — block parameters that should never be crawled
User-agent: *
Allow: /
# Block sort, view, page, in-stock, and price filters from being crawled
Disallow: /*?*sort=
Disallow: /*?*view=
Disallow: /*?*price_min=
Disallow: /*?*price_max=
Disallow: /*?*in_stock=
Disallow: /*?*size=
Disallow: /*?*page=2
Disallow: /*?*page=3
# (allow page=1 implicitly via the canonical category URL)
# Allow legitimately indexed facets: color and brand only
# (achieved by NOT disallowing the bare ?color= or ?brand= patterns)
Sitemap: https://example.com/sitemap.xml
// Filter UI generates clean, alphabetized, lowercase URLs only for indexable facets
function generateFacetUrl(state) {
const indexableFacets = ["color", "brand"];
const params = new URLSearchParams();
for (const facet of indexableFacets) {
if (state[facet]) params.set(facet, state[facet].toLowerCase());
}
const sorted = new URLSearchParams([...params.entries()].sort());
const query = sorted.toString();
return query ? `/shoes/?${query}` : "/shoes/";
}
// Sort, view, page, size live entirely in component state — never emitted as URLs
<!-- Self-canonical only on indexable facet pages -->
<!-- /shoes/?brand=nike — indexable -->
<link rel="canonical" href="https://example.com/shoes/?brand=nike" />
<title>Nike Running Shoes — Acme</title>
<!-- /shoes/ with internal sort=price-asc applied — canonicalize back to clean URL -->
<link rel="canonical" href="https://example.com/shoes/" />
<!-- sitemap.xml lists only the indexable facet variants -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://example.com/shoes/</loc></url>
<url><loc>https://example.com/shoes/?brand=nike</loc></url>
<url><loc>https://example.com/shoes/?brand=adidas</loc></url>
<url><loc>https://example.com/shoes/?color=red</loc></url>
<url><loc>https://example.com/shoes/?color=black</loc></url>
<!-- Combinations (color + brand) only when verified demand exists -->
</urlset>
The robots.txt blocks every UX-only parameter and the long-tail of size/price combinations. The filter UI emits URLs only for indexable facets, with consistent case and ordering. Sitemap lists exactly the URLs you want indexed. The crawl budget is now spent on URLs that can rank, not on combinations that cannot.
Do this today
- In Google Search Console → Indexing → Pages, click “Crawled - currently not indexed” and “Discovered - currently not indexed”. Sort by URL pattern. Any cluster of facet URLs in these reports means Google is crawling them and choosing not to index — wasted budget you can reclaim.
- Run Screaming Frog with Configuration → Spider → Crawl → Parameter Stripping disabled. The total URL count compared to your “real” page count tells you the facet bloat ratio. 5x is normal; 50x is a fire.
- Use Ahrefs Keywords Explorer or Semrush Keyword Magic to find which facet combinations have real search demand (
vegan leather sofa,red running shoes size 10). The list of combinations with >100 searches/month is your indexable set. - Build the decision matrix for each facet in a spreadsheet: facet name, has-demand?, value-per-combo, decision (index/noindex-follow/block/AJAX). Get sign-off from the e-commerce team before you implement.
- Update
robots.txtwithDisallow:rules for every parameter that should not be crawled. Use the GSC robots.txt Tester to validate that important URLs aren’t accidentally blocked. - Refactor your filter UI so UX-only state (sort, view, in-stock, page>1) lives in component state and never emits URLs. Indexable facets get clean, alphabetized, lowercase parameters via
pushState. - Update canonicals so every URL with non-indexable parameters canonicalizes to the cleanest indexable equivalent.
/shoes/?sort=price&view=gridcanonical =/shoes/./shoes/?brand=nike&sort=pricecanonical =/shoes/?brand=nike. - Build an XML sitemap of the indexable facet variants only. Submit it in Search Console → Sitemaps. Compare the index coverage report a month later.
- Monitor Search Console → Settings → Crawl stats. Sustained drops in “Total crawl requests” after the change are a red flag (legitimate URLs got blocked); flat or rising “Pages indexed” with falling crawl requests is the goal.
- Re-audit quarterly. Facet demand shifts; new product lines add new dimensions; “indexable” combinations from last year may now be thin. Pruning faceted indexation is recurring work, not a one-time project.
Mark complete
Toggle to remember this module as mastered. Saved to your browser only.
More in this part
Part 5: Technical SEO
- 026 Technical SEO Fundamentals 12m
- 027 Site Architecture 20m
- 028 Crawling & Indexing 17m
- 029 robots.txt Deep Dive 15m
- 030 XML Sitemaps 12m
- 031 Canonical Tags 20m
- 032 Meta Robots & X-Robots-Tag 13m
- 033 HTTP Status Codes 15m
- 034 Crawl Budget Management 16m
- 035 JavaScript SEO 26m
- 036 Core Web Vitals 17m
- 037 Site Speed & Performance 19m
- 038 HTTPS & Site Security 12m
- 039 Mobile SEO & Mobile-First Indexing 14m
- 040 Structured Data & Schema Markup 17m
- 041 International SEO (hreflang) 19m
- 042 Pagination 12m
- 043 Faceted Navigation You're here 26m
- 044 Duplicate Content 13m
- 045 Site Migrations 24m