Site Architecture
Flat vs deep structures, the three-clicks rule, categorical hierarchy, BreadcrumbList schema, and silo design that actually affects rankings.
Site architecture is the topology of your domain — which URLs link to which, how deep pages live from the root, and what categorical narrative the link graph tells. Get it right and PageRank flows naturally to your money pages. Get it wrong and your best content sits four redirects deep behind an orphaned tag taxonomy.
TL;DR
- Flat beats deep, but only when the flatness is meaningful. A 50,000-page site dumped at the root with no hierarchy is harder to rank than the same content organized into pillar/cluster silos.
- The “three clicks” rule is folk wisdom; click depth distribution is the actual signal. Googlebot weighs internal link quality and PageRank flow far more than literal click count. What matters is that high-value pages receive concentrated internal authority.
BreadcrumbListJSON-LD is non-negotiable in 2026. It earns the breadcrumb display in SERPs, helps AI Overviews understand entity hierarchy, and givesPerplexityBota cleaner citation context.
The mental model
Site architecture is like a city’s wayfinding system. A tourist (Googlebot) lands at a transit hub (homepage), follows signs (internal links) through districts (categories) to a specific address (your page). The clearer the signage and the more deliberate the district boundaries, the faster the tourist gets to the destination — and the more confidently they can describe what they saw.
Two failure modes dominate. First: the flat dump — every page links to a sitemap and nothing else. The tourist sees one giant intersection with 50,000 unlabeled exits. Second: the endless basement — a category tree six levels deep with most leaves orphaned. The tourist gives up on floor three.
The expert pattern is the silo: a clear pillar page at the top of each topic, cluster pages directly under it, all internally linked into a tight thematic web. Authority concentrates rather than diffuses, and Google’s MUM-class topic models can read the entity relationships without guessing.
Deep dive: the 2026 reality
Two things changed how architecture actually affects rankings in 2026:
- Google’s link graph weighting now leans heavily on topical proximity. A link from a thematically related page passes more equity than a link from an unrelated one. This was always partly true; it became measurable post-March-2024 core update. Internal links from your
/seo/technical/pillar to a/seo/technical/canonicalscluster page outweigh ten links from your homepage footer. - AI answer engines parse breadcrumbs explicitly. When Perplexity cites a page, it often shows the breadcrumb trail beneath the snippet. Google AI Overviews uses breadcrumb schema to disambiguate entity meaning. Sites without proper breadcrumb markup get fewer citations.
Click depth distribution is the metric that actually correlates with indexation and ranking. Run a crawl, group URLs by depth from the homepage, and look at the cumulative distribution. A healthy mid-size site might look like:
| Depth | URLs | Cumulative % |
|---|---|---|
| 0 (root) | 1 | 0.01% |
| 1 | 12 | 0.13% |
| 2 | 240 | 2.5% |
| 3 | 3,800 | 41% |
| 4 | 5,200 | 95% |
| 5+ | 460 | 100% |
If the long tail at depth 5+ contains revenue pages, you have a problem. If depth 3 contains 80% of your money pages, you are healthy.
The silo design has two flavors. Hard silos (rigid: cluster pages only link within their pillar) preserve topical purity but underuse cross-topic relevance. Soft silos (default cluster-to-pillar plus selective cross-links) match how Google’s actual topic graph works. Choose soft silos in 2026.
Visualizing it
flowchart TD
H[Homepage] --> P1[Pillar: Technical SEO]
H --> P2[Pillar: On-Page SEO]
H --> P3[Pillar: Link Building]
P1 --> C1[Crawling and Indexing]
P1 --> C2[Canonicals]
P1 --> C3[robots.txt]
P2 --> C4[Title Tags]
P2 --> C5[Schema Markup]
C1 -. cross-link .-> C2
C2 -. cross-link .-> C3
C5 -. cross-link .-> C2
Bad vs. expert
The bad approach
The bad architecture flattens everything into the root and hopes for the best:
example.com/
example.com/post-1
example.com/post-2
example.com/cool-thing-i-wrote
example.com/another-post
example.com/page-about-stuff
Or, worse, it nests everything six layers into a CMS-driven taxonomy:
example.com/blog/category/subcategory/year/month/day/post-slug
Both fail Googlebot’s mental model. The flat dump has zero hierarchy signal — every URL claims to be of equal importance. The deep nest puts every content asset behind multiple cluster boundaries that pass diluted equity, and the date-based URL pattern guarantees /2024/ content looks stale next year regardless of whether you updated it.
<!-- Bad: no breadcrumbs, no hierarchy signal -->
<nav>
<a href="/">Home</a>
<a href="/blog">Blog</a>
</nav>
<h1>How canonical tags work</h1>
The expert approach
Pillar/cluster URL pattern, breadcrumb component with proper schema:
example.com/
example.com/technical-seo/ (pillar)
example.com/technical-seo/canonicals/ (cluster)
example.com/technical-seo/robots-txt/ (cluster)
example.com/technical-seo/javascript-seo/ (cluster)
example.com/on-page-seo/ (pillar)
example.com/on-page-seo/title-tags/ (cluster)
---
// src/components/Breadcrumb.astro
interface Crumb { name: string; url: string; }
const { trail } = Astro.props as { trail: Crumb[] };
const itemListElement = trail.map((c, i) => ({
"@type": "ListItem",
position: i + 1,
name: c.name,
item: new URL(c.url, Astro.site).toString(),
}));
const jsonLd = {
"@context": "https://schema.org",
"@type": "BreadcrumbList",
itemListElement,
};
---
<nav aria-label="Breadcrumb">
<ol class="breadcrumb">
{trail.map((c, i) => (
<li>
{i < trail.length - 1
? <a href={c.url}>{c.name}</a>
: <span aria-current="page">{c.name}</span>}
</li>
))}
</ol>
</nav>
<script type="application/ld+json" set:html={JSON.stringify(jsonLd)} />
The resulting JSON-LD that ships in <head>:
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"name": "Home",
"item": "https://example.com/"
},
{
"@type": "ListItem",
"position": 2,
"name": "Technical SEO",
"item": "https://example.com/technical-seo/"
},
{
"@type": "ListItem",
"position": 3,
"name": "Canonical Tags",
"item": "https://example.com/technical-seo/canonicals/"
}
]
}
This works because (a) the URL string itself is a hierarchy signal, (b) the breadcrumb HTML provides multiple internal links concentrated on the topical parent, and (c) the JSON-LD makes the relationship machine-readable for SERPs and AI engines alike.
Do this today
- Run a Screaming Frog SEO Spider crawl and open Site Structure > Crawl Depth. Identify any URL with
Sessions > 0(after connecting GA4) that sits at depth 4 or deeper. These are revenue-earning pages buried too far from the root. - In Ahrefs Site Audit, open the Internal Page Rank report. Sort descending. Confirm your money pages are in the top 5%. If not, your link graph is misaligned with your priorities.
- Map your top 5 topical pillars. Each pillar gets one URL like
/topic/, ideally a long-form guide. Each cluster page lives at/topic/subtopic/and links back to the pillar in body text, not just the breadcrumb. - Implement a Breadcrumb component that renders both visible breadcrumb HTML and
BreadcrumbListJSON-LD. Test with Google’s Rich Results Test atsearch.google.com/test/rich-results. Confirm the Breadcrumbs result shows green. - In Google Search Console > Enhancements > Breadcrumbs, verify the report exists and item count grows over the next 14 days. If GSC reports
0 valid items, your schema is malformed. - For each pillar, audit internal links with the Screaming Frog Link Score column. Each cluster should have at least 3 inbound internal links from related cluster pages. Use the Custom Search filter to find clusters with
<3inbound links. - Kill orphan pages. Run Configuration > Spider > Crawl Behaviour > Crawl Linked XML Sitemaps in Screaming Frog. URLs in the sitemap with zero inbound internal links are orphans. Either link to them from the appropriate pillar or remove them from the sitemap.
- Add a lateral links module to cluster pages that surfaces the 4–6 most semantically related siblings. Implement similarity via TF-IDF or embeddings, not alphabetical order.
- Re-crawl after changes. Confirm crawl depth distribution shifts: target 80% of high-value URLs at depth 2 or 3, with no revenue page deeper than 4.
Mark complete
Toggle to remember this module as mastered. Saved to your browser only.
More in this part
Part 5: Technical SEO
- 026 Technical SEO Fundamentals 12m
- 027 Site Architecture You're here 20m
- 028 Crawling & Indexing 17m
- 029 robots.txt Deep Dive 15m
- 030 XML Sitemaps 12m
- 031 Canonical Tags 20m
- 032 Meta Robots & X-Robots-Tag 13m
- 033 HTTP Status Codes 15m
- 034 Crawl Budget Management 16m
- 035 JavaScript SEO 26m
- 036 Core Web Vitals 17m
- 037 Site Speed & Performance 19m
- 038 HTTPS & Site Security 12m
- 039 Mobile SEO & Mobile-First Indexing 14m
- 040 Structured Data & Schema Markup 17m
- 041 International SEO (hreflang) 19m
- 042 Pagination 12m
- 043 Faceted Navigation 26m
- 044 Duplicate Content 13m
- 045 Site Migrations 24m