Canonical Tags

rel="canonical" is your declaration that a particular URL is the master version of a page when multiple URLs serve substantially the same content. It is a hint, not a directive — Google reserves the right to choose a different canonical if your declared one looks wrong. Knowing when canonicals are honored, when they are ignored, and when a 301 or noindex is the correct tool instead is the difference between a clean index and a duplication mess.

TL;DR

Canonical is a hint, not a command. Google can and does override your declaration when signals contradict — internal links, sitemap inclusion, redirect chains, and content similarity all factor in. The Google-selected canonical in URL Inspection is what actually counts.
Self-referencing canonicals on every indexable page is the safe default. They explicitly tell Google “this URL, the one you just fetched, is the canonical version” and prevent parameter-variant crawls from outvoting your real URL.
Canonical, noindex, and 301 are not interchangeable. Canonical consolidates ranking signals between similar URLs. noindex removes from the index. 301 is a permanent move. Picking the wrong one is a multi-week recovery cost.

The mental model

Canonical tags are like a “see also: definitive edition” card in a library catalog. Multiple cards may exist for the same book — paperback, hardcover, large print — but the catalog points users to a single preferred edition. The other editions still exist on the shelves, but the librarian’s recommendation flows to the one canonical edition.

Critically, the librarian gets to override the publisher’s preference. If the publisher’s “definitive edition” sticker is on a paperback that nobody reads, while the hardcover has all the inbound foot traffic, the librarian quietly redirects patrons to the hardcover regardless of the sticker. Google does the same: when your declared canonical and the link signals disagree, Google picks the one with stronger external evidence.

Deep dive: the 2026 reality

Google uses canonicals as one input among many to its canonicalization algorithm. The actual decision factors, in rough order of weight:

301 redirects — strongest signal; if URL A 301s to URL B, B is canonical.
rel="canonical" annotation — strong hint, especially when consistent.
HTTPS over HTTP — Google prefers HTTPS by default.
Sitemap inclusion — URLs in the sitemap are biased toward canonical.
Internal link weight — the URL most internally linked tends to win.
URL form — shorter, cleaner URLs are preferred when other signals tie.
hreflang reciprocity — locale variants confirm each other as non-duplicates.

When your declared canonical conflicts with these signals, Google overrides you. The GSC URL Inspection > Page indexing > Canonical field shows the truth: User-declared canonical is what your HTML says; Google-selected canonical is what Google actually uses. Discrepancy here is the symptom; the cause is usually contradictory internal links, a missing or wrong canonical on a duplicate, or an hreflang annotation that points to a noindexed URL.

A canonical can point cross-domain. The classic case is syndication: you republish your blog post on Medium, and Medium’s editor offers to set rel="canonical" back to your domain. This works and is widely respected. The same pattern handles dev.to, Substack cross-posts, and partner content distribution.

When Google ignores canonicals (common cases):

Cause	What Google does	Fix
Canonical points to noindexed URL	Picks a different canonical	Make the canonical target indexable
Canonical points to a 404 or 5xx	Picks a different canonical	Fix the target URL
Canonical chain (A → B → C)	Follows the chain, may stop early	Point all duplicates directly at the final canonical
Canonical contradicts sitemap	Trusts sitemap unless redirect says otherwise	Align canonical and sitemap
Canonical from page A → B, but B’s canonical points back at A	Picks one based on other signals	Pick a side; do not loop
Different content but same canonical	Treats one as duplicate; only one indexed	Differentiate or split

For AI search, canonical handling is less mature. PerplexityBot typically respects canonicals. GPTBot and ClaudeBot are training crawlers and tend to ingest both URLs without canonical resolution; the deduplication happens (or does not) at training-data-cleaning time. Practically: do not rely on canonicals to keep duplicate content out of LLM training. If a URL must not be ingested, use robots.txt blocks for the AI user-agents.

Visualizing it

flowchart TD
  A["URL crawled: example.com/widget?utm=email"] --> B[Parse rel=canonical]
  B --> C{Declared canonical valid?}
  C -->|No| D[Use Google's heuristics]
  C -->|Yes| E[Add to canonical signal set]
  D --> F[Combine signals: 301s, sitemap, links, hreflang]
  E --> F
  F --> G{Strongest signal cluster?}
  G -->|Matches declared| H[Honor declaration]
  G -->|Differs| I[Override: pick stronger URL]
  H --> J[Index the canonical]
  I --> J

Bad vs. expert

The bad approach

The bad pattern: hard-coded canonical at the top of the layout that points to a fixed URL across every page.

<!-- Bad: same canonical on every page in the site -->
<head>
  <link rel="canonical" href="https://example.com/" />
  <title>Widget – Example</title>
</head>

What happens: every page declares the homepage as its canonical. Google sees the contradiction (URL /widget claims its canonical is /, but /widget has different content), tries to apply the rule, eventually overrides — and meanwhile your entire site looks like duplicates of the homepage. This is a common Webflow / WordPress misconfiguration when a developer copies a <link rel="canonical"> snippet without templating it.

Another bad pattern: relative canonicals.

<!-- Bad: relative URLs -->
<link rel="canonical" href="/widget" />

Some legacy crawlers and parsers misinterpret relative canonicals, especially under different protocols or subdomains. Use absolute URLs always.

A third: canonicalizing paginated content to page 1.

<!-- Bad: page 2 canonicals to page 1 -->
<!-- on /blog?page=2 -->
<link rel="canonical" href="https://example.com/blog?page=1" />

This was once recommended; Google explicitly walked it back in 2019. Each pagination page should self-canonical so Google can index the unique items on each page.

The expert approach

Self-referencing canonical, generated dynamically from the request URL with parameters stripped except where they materially change content:

// src/utils/canonical.ts
const ALLOWED_PARAMS = new Set(['page', 'sort']);

export function buildCanonical(currentUrl: URL, siteOrigin: string): string {
  const u = new URL(currentUrl.pathname, siteOrigin);
  for (const [k, v] of currentUrl.searchParams) {
    if (ALLOWED_PARAMS.has(k)) u.searchParams.set(k, v);
  }
  // Force trailing-slash policy: present on directories, absent on files
  if (!u.pathname.includes('.') && !u.pathname.endsWith('/')) {
    u.pathname += '/';
  }
  return u.toString().toLowerCase();
}

Used in an Astro layout:

---
import { buildCanonical } from '../utils/canonical';
const canonical = buildCanonical(new URL(Astro.request.url), Astro.site!.toString());
---
<head>
  <link rel="canonical" href={canonical} />
</head>

Cross-domain canonical for syndicated content (you control both ends):

<!-- On the syndicated copy at partner.com/blog/canonicals-explained -->
<link rel="canonical" href="https://example.com/technical-seo/canonicals/" />

Faceted ecommerce: only the unfiltered category page is canonical; filter combinations canonical to the parent.

<!-- On /products/shoes?color=red&size=10 -->
<link rel="canonical" href="https://example.com/products/shoes/" />
<!-- And on the unfiltered category page -->
<link rel="canonical" href="https://example.com/products/shoes/" />

Choosing between canonical, noindex, and 301:

Situation	Right tool	Why
Two URLs serve same content, one is preferred	Canonical	Consolidates signals; both URLs remain accessible
URL should not appear in search but page must exist for users	noindex	Removes from index without removing the URL
URL has permanently moved to a new path	301	Transfers full link equity, deindexes old URL
URL temporarily moved	302	Preserves old URL in index
Page has been deleted permanently	410	Faster deindexation than 404
Faceted filter URL with similar content to parent	Canonical to parent + parameter handling	Avoids index bloat
Pagination	Self-canonical + crawlable page links	Each page indexed separately
Syndicated content	Cross-domain canonical to original	Original keeps ranking equity

Do this today

Run a Screaming Frog SEO Spider crawl. Open the Canonicals tab, then filter by Canonicalised. Each URL listed has a canonical that does not point to itself. Audit each: is the target URL indexable, is it the right URL, does it return 200?
In GSC > URL Inspection, check 5 of your top revenue URLs. Confirm User-declared canonical equals Google-selected canonical. Any mismatch is a top-priority diagnostic.
Audit for canonical chains. In Screaming Frog, look at Reports > Canonicals > Canonical Chains. Any chain of length > 1 wastes crawl budget — flatten so every duplicate points directly at the final URL.
Verify absolute URLs. Search your codebase for rel="canonical" and confirm every value starts with https://. Relative canonicals are syntactically valid but increase the chance of misinterpretation across crawlers.
Confirm self-referencing canonicals on every indexable URL. The canonical of /foo/bar/ should be https://yourdomain.com/foo/bar/, exactly. Add automated tests in your build that fail if any indexable page is missing this.
For faceted navigation, decide a policy: which parameter values produce indexable pages and which canonical to the parent. Document and enforce. Use GSC > Settings > Crawl Stats > URL parameters (legacy in 2026 but still informative) to spot parameter explosions.
Diff sitemap URLs against canonical declarations. Run a script that fetches each sitemap URL and parses the <link rel="canonical">. Any URL whose canonical points elsewhere should be removed from the sitemap.
For syndication partners, confirm the partner page actually contains your canonical tag. Use curl -s https://partner.com/your-piece | grep canonical after every cross-post.
Set up an Ahrefs Site Audit scheduled run with the Canonicals check enabled. Alert on any new “canonical points to broken page” or “canonical points to redirect” issues.
For pages that should leave the index entirely (e.g., merged or deprecated), do not use canonical to a different URL. Use noindex first, wait for Googlebot to recrawl and process, then 301 once the URL is out of the index. Canonical is for similar-but-coexisting content; deindexation needs noindex or 410.