Faceted Navigation for E-commerce
The combinatorial explosion problem at scale. Indexing only facets with demand, AJAX vs URL filtering, and the rules that keep the crawl frontier sane.
Faceted navigation is the SEO problem that has bankrupted more crawl budgets than any other. Add a Color, Size, Brand, and Price filter to a 5,000-product catalog and you don’t have 5,000 URLs anymore — you have millions. The work is not in adding filters; it’s in deciding which 200 of the millions deserve to live in Google’s index.
TL;DR
- The combinatorial math is brutal and often miscounted. Four facets with five values each don’t yield 20 URLs — they yield 2^20 = 1,048,576 unique facet states, before pagination and sort orders.
- Index by demand, not by default. Out of millions of theoretical URLs, 50-500 typically have search volume worth ranking for. The rest are
noindex,follow, blocked, or AJAX-only. - AJAX-only facets vs. crawlable URLs is a deliberate decision per facet, per category. Treat each facet’s combinatorics, demand, and conversion potential as a discrete choice.
The mental model
Faceted navigation is like a library catalog system gone feral. Every patron picks a different combination — fiction + paperback + 1990s + under $10 — and the librarian (your CMS) cheerfully writes out a new card for each request. Multiply by 10,000 patrons making one query each, and the card catalog now holds more cards than books. None of the cards point to anything Google’s crawler would judge as a unique resource.
Four levers control the fire:
- Which facets generate URLs at all. AJAX-driven filters create no new URLs; URL-parameter or path-segment filters do.
- Which generated URLs are indexable. Self-canonicalize, canonicalize-to-parent, or
noindex. - Which generated URLs are crawlable. Block in
robots.txt, allow withnoindex, or expose freely. - Which generated URLs receive internal links. Pages with no incoming internal links do not get crawled even if technically allowed.
Treat the four levers independently per facet. The mistake is applying a global rule when the right answer differs per axis.
Deep dive: the 2026 reality
The math first, because it is always worse than people guess.
A typical apparel category has these facets:
| Facet | Values |
|---|---|
| Size | XS, S, M, L, XL, XXL (6) |
| Color | 12 |
| Brand | 14 |
| Material | 6 |
| Price band | 5 |
| Sleeve length | 3 |
| Fit | 4 |
Naive multiplication: 6 × 12 × 14 × 6 × 5 × 3 × 4 = 362,880 unique facet combinations. With multi-select on each (the realistic case where a user picks XS and S), the math becomes (2^6 - 1) × (2^12 - 1) × (2^14 - 1) … and it crosses into the billions per category. Across 40 categories, you have a URL space measured in trillions.
Google’s crawler can handle large URL spaces, but crawl budget caps how many URLs from a domain Googlebot fetches per day. The number is a moving target, but for a mid-market site it is on the order of 50,000-500,000 URLs per day. If your site exposes 10 million URLs and Googlebot crawls 100,000 per day, your full re-crawl cycle is 100 days. Real changes take that long to surface.
Indexed-but-thin is the second-order problem. Pages that get into the index but show empty product grids (“0 results for Pink Wool Petite XS”) trip the Helpful Content classifier. The 2024 March update demoted dozens of fashion retailers whose long-tail facet URLs sat in the index returning empty grids.
The 2026 best practice from Google’s John Mueller (consistent across statements 2019-2025) and validated by Ahrefs, Botify, and Lumar crawl studies:
- AJAX-only facets for low-demand, high-cardinality axes (size, color, price band).
- Indexable URLs for high-demand axes that match real search queries (brand, gender, type, season).
noindex,followfor medium-demand multi-facet states you want crawlers to traverse but not rank.Disallow:inrobots.txtfor combinatorial junk you’d rather not have crawled at all.- Always self-canonical, never canonical-to-parent on a state with unique inventory.
Sort orders, page numbers, and tracking parameters are separate from facets and require their own rules. ?sort=, ?page=, ?utm_source= on a faceted URL multiply the same combinatorial blast.
Visualizing it
flowchart TD
A[User picks facet] --> B{Has search demand?}
B -->|High| C{Generates many combinations?}
B -->|Low| D[AJAX only - no URL change]
C -->|Yes, single facet| E[Crawlable + indexable URL]
C -->|Yes, multi-facet| F[Crawlable + noindex,follow]
C -->|No| G[AJAX or robots.txt block]
E --> H[Self-canonical]
E --> I[Unique title and intro]
E --> J[Internal links from parent + siblings]
F --> K[Self-canonical]
F --> L[Crawlable for path discovery]
G --> M[Disallow in robots.txt]
Bad vs. expert
The bad approach
Default Magento or WooCommerce produces something like this on every facet click:
/category/shirts?color=red&size=m&brand=acme&sort=price_asc&page=2
/category/shirts?color=red&brand=acme&size=m&page=2&sort=price_asc
/category/shirts?brand=acme&color=red&size=m&page=2&sort=price_asc
Three URLs, identical state, different parameter order. Each hits the index. Each canonicalizes to itself by default. The site exposes the same inventory through 6 (parameter orders) × 2,000 (combinations) × 10 (page numbers) × 3 (sort orders) = 360,000 URL variants per category. Across 40 categories, 14.4 million URLs all backed by 5,000 products.
Robots.txt is empty. noindex is not applied. Canonicals self-reference. The result: crawl budget burns on facet noise, real new products don’t get crawled for weeks, the Helpful Content classifier flags the catalog as low-value, sitewide demotion follows the next core update.
The expert approach
Decide per-facet, then enforce with the four levers. Example for an apparel category:
Facet | URL? | Indexable? | Crawlable? | Reason
----------------|------|--------------|------------|-------------------------------
brand | Yes | Yes | Yes | High demand: "nike running shoes"
gender | Yes | Yes | Yes | High demand: "women's running shoes"
type/style | Yes | Yes | Yes | High demand: "trail running shoes"
season | Yes | Yes | Yes | Demand: "winter running jackets"
size | No | n/a | n/a | AJAX only — no URL state
color | No | n/a | n/a | AJAX only — no URL state
price band | No | n/a | n/a | AJAX only — no URL state
brand + gender | Yes | Yes | Yes | Demand: "men's nike running shoes"
brand + style | Yes | Yes | Yes | Demand: "nike trail running shoes"
3+ facets | Yes | No (noindex) | Yes | Crawl path, no demand
sort, page | Yes | Self-canon | Yes | Pagination needed, dedupe via canonical
utm, fbclid | Yes | Self-canon | No | Block in robots.txt
robots.txt becomes precise:
User-agent: *
Disallow: /*?*utm_
Disallow: /*?*fbclid=
Disallow: /*?*gclid=
Disallow: /*?*sid=
Disallow: /*?*sessionId=
Disallow: /search/
Disallow: /cart/
Disallow: /account/
# Allow facet URLs explicitly so the rest of the catalog crawls cleanly
Allow: /shoes/
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-products.xml
Sitemap: https://example.com/sitemap-categories.xml
URL pattern uses paths for indexable axes and parameters for non-indexable state:
# Indexable: path segments, alphabetized
/shoes/running/men/nike
/shoes/running/men/nike/trail
# Not indexable but crawlable: parameters, deterministic order
/shoes/running/men/nike?size=10&color=black
# Server enforces parameter order on inbound requests
# Reorder + 301 if user hits /shoes/?color=black&size=10
The page template injects per-state metadata:
// Pseudocode for a facet-aware route
export default function FacetedCategoryPage({ params, facets }) {
const indexableState = isIndexableFacetState(params, facets);
const productCount = facets.results.length;
return (
<>
<Head>
<title>{generateTitle(params, productCount)}</title>
<link rel="canonical" href={canonicalUrl(params)} />
{!indexableState && <meta name="robots" content="noindex,follow" />}
{productCount === 0 && <meta name="robots" content="noindex,nofollow" />}
</Head>
{indexableState && <UniqueIntro params={params} />}
<ProductGrid products={facets.results} />
</>
);
}
function isIndexableFacetState(params, facets) {
if (facets.results.length === 0) return false;
const indexableAxes = ['brand', 'gender', 'type', 'season'];
const used = Object.keys(params).filter(k => params[k]);
// No more than 3 indexable axes; no non-indexable axes; minimum results
if (used.some(k => !indexableAxes.includes(k))) return false;
if (used.length > 3) return false;
if (facets.results.length < 4) return false;
return true;
}
For AJAX-only facets (size, color, price), the user click updates the grid via fetch but does not change window.location and does not generate a crawlable URL. Use history.replaceState for shareability if you must, but the canonical and the indexable state never change.
Platform-specific guidance:
| Platform | Default behavior | Fix |
|---|---|---|
| Shopify | Tag-based filters create /collections/x/y paths; new Search & Discovery app uses query params, all crawlable | Add <meta name="robots" content="noindex,follow"> for any tag URL with 3+ tags via theme template; block ?filter. params in robots if not indexed |
| WooCommerce | ?orderby=, ?per_page=, ?_chosen_attributes= all crawl by default | RankMath/Yoast advanced robots; add Disallow: /*?orderby= to robots.txt; canonical to base category for filtered states |
| BigCommerce | Faceted filters via ?<facet>= parameters | Use Page Builder robots controls; canonical paginated and sorted variants to base |
| Magento (Adobe Commerce) | Layered nav generates ?<attr>= params; rich SEO controls available | Configure URL Rewrites + robots.txt per attribute; Magefan SEO or Mageworx SEO plugins for granular noindex rules |
Do this today
- In Screaming Frog, configure the spider with parameter handling: Configuration → URL Rewriting → Remove Parameters. Add
utm_source,utm_medium,gclid,fbclid,srsltid. Run a crawl. Compare unique URLs found to the count of products in your catalog. A ratio over 5:1 means you have a facet leak. - Open GSC → Indexing → Pages. Filter “Indexed” by URL containing
?. Any indexed URL with parameters is a candidate fornoindexor canonical-to-parent. Pull the top 200 by impressions, decide per URL, ship the rules. - Use Botify or Lumar (formerly DeepCrawl) — or a Screaming Frog crawl with log-file analysis — to determine which facet URLs Googlebot actually fetches. Cross-reference with GA4 → Acquisition → landing pages. URLs that get crawled but never landed on are pure crawl-budget waste.
- Build a facet demand matrix in a spreadsheet. Columns: facet axis, value, monthly search volume (from Ahrefs, Semrush, or Google Keyword Planner), conversion rate (from GA4 if landed on), strategic value. Rank by the product. Promote the top 30-100 to indexable URLs; demote everything else.
- Decide per axis: AJAX or URL? Size and color are almost always AJAX. Brand and type are almost always URL. Price band is almost always AJAX. Gender is URL when it changes the inventory mix.
- Audit
robots.txt. Block tracking parameters, internal search, cart, account, and any combinatorial junk you’ve identified. Test with GSC robots.txt Tester before pushing. - Implement canonical URL parameter ordering at the server. Reordered parameter URLs (
?b=2&a=1vs?a=1&b=2) should 301 to a single canonical order. Most platforms handle this poorly by default. - Set up a monitoring alert on indexed URL count via the GSC API. If the number jumps by 20%+ in a week without you launching new categories, a facet rule has regressed. Treat it as a P1 incident.
- For empty-result faceted URLs (the “0 results for Pink Wool Petite” problem), force
noindex,nofollowand consider returning a404if the combination is structurally impossible. Empty grids are the single most-cited Helpful Content failure mode in commerce.
Mark complete
Toggle to remember this module as mastered. Saved to your browser only.
More in this part