Canonical Tags
Self-referencing canonicals, cross-domain canonicals, conflict diagnosis, and when Google ignores them. Canonical vs noindex vs redirect — choosing the right tool.
rel="canonical" is your declaration that a particular URL is the master version of a page when multiple URLs serve substantially the same content. It is a hint, not a directive — Google reserves the right to choose a different canonical if your declared one looks wrong. Knowing when canonicals are honored, when they are ignored, and when a 301 or noindex is the correct tool instead is the difference between a clean index and a duplication mess.
TL;DR
- Canonical is a hint, not a command. Google can and does override your declaration when signals contradict — internal links, sitemap inclusion, redirect chains, and content similarity all factor in. The Google-selected canonical in URL Inspection is what actually counts.
- Self-referencing canonicals on every indexable page is the safe default. They explicitly tell Google “this URL, the one you just fetched, is the canonical version” and prevent parameter-variant crawls from outvoting your real URL.
- Canonical, noindex, and 301 are not interchangeable. Canonical consolidates ranking signals between similar URLs.
noindexremoves from the index. 301 is a permanent move. Picking the wrong one is a multi-week recovery cost.
The mental model
Canonical tags are like a “see also: definitive edition” card in a library catalog. Multiple cards may exist for the same book — paperback, hardcover, large print — but the catalog points users to a single preferred edition. The other editions still exist on the shelves, but the librarian’s recommendation flows to the one canonical edition.
Critically, the librarian gets to override the publisher’s preference. If the publisher’s “definitive edition” sticker is on a paperback that nobody reads, while the hardcover has all the inbound foot traffic, the librarian quietly redirects patrons to the hardcover regardless of the sticker. Google does the same: when your declared canonical and the link signals disagree, Google picks the one with stronger external evidence.
Deep dive: the 2026 reality
Google uses canonicals as one input among many to its canonicalization algorithm. The actual decision factors, in rough order of weight:
- 301 redirects — strongest signal; if URL A 301s to URL B, B is canonical.
rel="canonical"annotation — strong hint, especially when consistent.- HTTPS over HTTP — Google prefers HTTPS by default.
- Sitemap inclusion — URLs in the sitemap are biased toward canonical.
- Internal link weight — the URL most internally linked tends to win.
- URL form — shorter, cleaner URLs are preferred when other signals tie.
hreflangreciprocity — locale variants confirm each other as non-duplicates.
When your declared canonical conflicts with these signals, Google overrides you. The GSC URL Inspection > Page indexing > Canonical field shows the truth: User-declared canonical is what your HTML says; Google-selected canonical is what Google actually uses. Discrepancy here is the symptom; the cause is usually contradictory internal links, a missing or wrong canonical on a duplicate, or an hreflang annotation that points to a noindexed URL.
A canonical can point cross-domain. The classic case is syndication: you republish your blog post on Medium, and Medium’s editor offers to set rel="canonical" back to your domain. This works and is widely respected. The same pattern handles dev.to, Substack cross-posts, and partner content distribution.
When Google ignores canonicals (common cases):
| Cause | What Google does | Fix |
|---|---|---|
| Canonical points to noindexed URL | Picks a different canonical | Make the canonical target indexable |
| Canonical points to a 404 or 5xx | Picks a different canonical | Fix the target URL |
| Canonical chain (A → B → C) | Follows the chain, may stop early | Point all duplicates directly at the final canonical |
| Canonical contradicts sitemap | Trusts sitemap unless redirect says otherwise | Align canonical and sitemap |
| Canonical from page A → B, but B’s canonical points back at A | Picks one based on other signals | Pick a side; do not loop |
| Different content but same canonical | Treats one as duplicate; only one indexed | Differentiate or split |
For AI search, canonical handling is less mature. PerplexityBot typically respects canonicals. GPTBot and ClaudeBot are training crawlers and tend to ingest both URLs without canonical resolution; the deduplication happens (or does not) at training-data-cleaning time. Practically: do not rely on canonicals to keep duplicate content out of LLM training. If a URL must not be ingested, use robots.txt blocks for the AI user-agents.
Visualizing it
flowchart TD
A["URL crawled: example.com/widget?utm=email"] --> B[Parse rel=canonical]
B --> C{Declared canonical valid?}
C -->|No| D[Use Google's heuristics]
C -->|Yes| E[Add to canonical signal set]
D --> F[Combine signals: 301s, sitemap, links, hreflang]
E --> F
F --> G{Strongest signal cluster?}
G -->|Matches declared| H[Honor declaration]
G -->|Differs| I[Override: pick stronger URL]
H --> J[Index the canonical]
I --> J
Bad vs. expert
The bad approach
The bad pattern: hard-coded canonical at the top of the layout that points to a fixed URL across every page.
<!-- Bad: same canonical on every page in the site -->
<head>
<link rel="canonical" href="https://example.com/" />
<title>Widget – Example</title>
</head>
What happens: every page declares the homepage as its canonical. Google sees the contradiction (URL /widget claims its canonical is /, but /widget has different content), tries to apply the rule, eventually overrides — and meanwhile your entire site looks like duplicates of the homepage. This is a common Webflow / WordPress misconfiguration when a developer copies a <link rel="canonical"> snippet without templating it.
Another bad pattern: relative canonicals.
<!-- Bad: relative URLs -->
<link rel="canonical" href="/widget" />
Some legacy crawlers and parsers misinterpret relative canonicals, especially under different protocols or subdomains. Use absolute URLs always.
A third: canonicalizing paginated content to page 1.
<!-- Bad: page 2 canonicals to page 1 -->
<!-- on /blog?page=2 -->
<link rel="canonical" href="https://example.com/blog?page=1" />
This was once recommended; Google explicitly walked it back in 2019. Each pagination page should self-canonical so Google can index the unique items on each page.
The expert approach
Self-referencing canonical, generated dynamically from the request URL with parameters stripped except where they materially change content:
// src/utils/canonical.ts
const ALLOWED_PARAMS = new Set(['page', 'sort']);
export function buildCanonical(currentUrl: URL, siteOrigin: string): string {
const u = new URL(currentUrl.pathname, siteOrigin);
for (const [k, v] of currentUrl.searchParams) {
if (ALLOWED_PARAMS.has(k)) u.searchParams.set(k, v);
}
// Force trailing-slash policy: present on directories, absent on files
if (!u.pathname.includes('.') && !u.pathname.endsWith('/')) {
u.pathname += '/';
}
return u.toString().toLowerCase();
}
Used in an Astro layout:
---
import { buildCanonical } from '../utils/canonical';
const canonical = buildCanonical(new URL(Astro.request.url), Astro.site!.toString());
---
<head>
<link rel="canonical" href={canonical} />
</head>
Cross-domain canonical for syndicated content (you control both ends):
<!-- On the syndicated copy at partner.com/blog/canonicals-explained -->
<link rel="canonical" href="https://example.com/technical-seo/canonicals/" />
Faceted ecommerce: only the unfiltered category page is canonical; filter combinations canonical to the parent.
<!-- On /products/shoes?color=red&size=10 -->
<link rel="canonical" href="https://example.com/products/shoes/" />
<!-- And on the unfiltered category page -->
<link rel="canonical" href="https://example.com/products/shoes/" />
Choosing between canonical, noindex, and 301:
| Situation | Right tool | Why |
|---|---|---|
| Two URLs serve same content, one is preferred | Canonical | Consolidates signals; both URLs remain accessible |
| URL should not appear in search but page must exist for users | noindex | Removes from index without removing the URL |
| URL has permanently moved to a new path | 301 | Transfers full link equity, deindexes old URL |
| URL temporarily moved | 302 | Preserves old URL in index |
| Page has been deleted permanently | 410 | Faster deindexation than 404 |
| Faceted filter URL with similar content to parent | Canonical to parent + parameter handling | Avoids index bloat |
| Pagination | Self-canonical + crawlable page links | Each page indexed separately |
| Syndicated content | Cross-domain canonical to original | Original keeps ranking equity |
Do this today
- Run a Screaming Frog SEO Spider crawl. Open the Canonicals tab, then filter by Canonicalised. Each URL listed has a canonical that does not point to itself. Audit each: is the target URL indexable, is it the right URL, does it return 200?
- In GSC > URL Inspection, check 5 of your top revenue URLs. Confirm User-declared canonical equals Google-selected canonical. Any mismatch is a top-priority diagnostic.
- Audit for canonical chains. In Screaming Frog, look at Reports > Canonicals > Canonical Chains. Any chain of length > 1 wastes crawl budget — flatten so every duplicate points directly at the final URL.
- Verify absolute URLs. Search your codebase for
rel="canonical"and confirm every value starts withhttps://. Relative canonicals are syntactically valid but increase the chance of misinterpretation across crawlers. - Confirm self-referencing canonicals on every indexable URL. The canonical of
/foo/bar/should behttps://yourdomain.com/foo/bar/, exactly. Add automated tests in your build that fail if any indexable page is missing this. - For faceted navigation, decide a policy: which parameter values produce indexable pages and which canonical to the parent. Document and enforce. Use GSC > Settings > Crawl Stats > URL parameters (legacy in 2026 but still informative) to spot parameter explosions.
- Diff sitemap URLs against canonical declarations. Run a script that fetches each sitemap URL and parses the
<link rel="canonical">. Any URL whose canonical points elsewhere should be removed from the sitemap. - For syndication partners, confirm the partner page actually contains your canonical tag. Use
curl -s https://partner.com/your-piece | grep canonicalafter every cross-post. - Set up an Ahrefs Site Audit scheduled run with the Canonicals check enabled. Alert on any new “canonical points to broken page” or “canonical points to redirect” issues.
- For pages that should leave the index entirely (e.g., merged or deprecated), do not use canonical to a different URL. Use
noindexfirst, wait for Googlebot to recrawl and process, then 301 once the URL is out of the index. Canonical is for similar-but-coexisting content; deindexation needsnoindexor 410.
Mark complete
Toggle to remember this module as mastered. Saved to your browser only.
More in this part
Part 5: Technical SEO
- 026 Technical SEO Fundamentals 12m
- 027 Site Architecture 20m
- 028 Crawling & Indexing 17m
- 029 robots.txt Deep Dive 15m
- 030 XML Sitemaps 12m
- 031 Canonical Tags You're here 20m
- 032 Meta Robots & X-Robots-Tag 13m
- 033 HTTP Status Codes 15m
- 034 Crawl Budget Management 16m
- 035 JavaScript SEO 26m
- 036 Core Web Vitals 17m
- 037 Site Speed & Performance 19m
- 038 HTTPS & Site Security 12m
- 039 Mobile SEO & Mobile-First Indexing 14m
- 040 Structured Data & Schema Markup 17m
- 041 International SEO (hreflang) 19m
- 042 Pagination 12m
- 043 Faceted Navigation 26m
- 044 Duplicate Content 13m
- 045 Site Migrations 24m