Site Architecture

Site architecture is the topology of your domain — which URLs link to which, how deep pages live from the root, and what categorical narrative the link graph tells. Get it right and PageRank flows naturally to your money pages. Get it wrong and your best content sits four redirects deep behind an orphaned tag taxonomy.

TL;DR

Flat beats deep, but only when the flatness is meaningful. A 50,000-page site dumped at the root with no hierarchy is harder to rank than the same content organized into pillar/cluster silos.
The “three clicks” rule is folk wisdom; click depth distribution is the actual signal. Googlebot weighs internal link quality and PageRank flow far more than literal click count. What matters is that high-value pages receive concentrated internal authority.
BreadcrumbList JSON-LD is non-negotiable in 2026. It earns the breadcrumb display in SERPs, helps AI Overviews understand entity hierarchy, and gives PerplexityBot a cleaner citation context.

The mental model

Site architecture is like a city’s wayfinding system. A tourist (Googlebot) lands at a transit hub (homepage), follows signs (internal links) through districts (categories) to a specific address (your page). The clearer the signage and the more deliberate the district boundaries, the faster the tourist gets to the destination — and the more confidently they can describe what they saw.

Two failure modes dominate. First: the flat dump — every page links to a sitemap and nothing else. The tourist sees one giant intersection with 50,000 unlabeled exits. Second: the endless basement — a category tree six levels deep with most leaves orphaned. The tourist gives up on floor three.

The expert pattern is the silo: a clear pillar page at the top of each topic, cluster pages directly under it, all internally linked into a tight thematic web. Authority concentrates rather than diffuses, and Google’s MUM-class topic models can read the entity relationships without guessing.

Deep dive: the 2026 reality

Two things changed how architecture actually affects rankings in 2026:

Google’s link graph weighting now leans heavily on topical proximity. A link from a thematically related page passes more equity than a link from an unrelated one. This was always partly true; it became measurable post-March-2024 core update. Internal links from your /seo/technical/ pillar to a /seo/technical/canonicals cluster page outweigh ten links from your homepage footer.
AI answer engines parse breadcrumbs explicitly. When Perplexity cites a page, it often shows the breadcrumb trail beneath the snippet. Google AI Overviews uses breadcrumb schema to disambiguate entity meaning. Sites without proper breadcrumb markup get fewer citations.

Click depth distribution is the metric that actually correlates with indexation and ranking. Run a crawl, group URLs by depth from the homepage, and look at the cumulative distribution. A healthy mid-size site might look like:

Depth	URLs	Cumulative %
0 (root)	1	0.01%
1	12	0.13%
2	240	2.5%
3	3,800	41%
4	5,200	95%
5+	460	100%

If the long tail at depth 5+ contains revenue pages, you have a problem. If depth 3 contains 80% of your money pages, you are healthy.

The silo design has two flavors. Hard silos (rigid: cluster pages only link within their pillar) preserve topical purity but underuse cross-topic relevance. Soft silos (default cluster-to-pillar plus selective cross-links) match how Google’s actual topic graph works. Choose soft silos in 2026.

Visualizing it

flowchart TD
  H[Homepage] --> P1[Pillar: Technical SEO]
  H --> P2[Pillar: On-Page SEO]
  H --> P3[Pillar: Link Building]
  P1 --> C1[Crawling and Indexing]
  P1 --> C2[Canonicals]
  P1 --> C3[robots.txt]
  P2 --> C4[Title Tags]
  P2 --> C5[Schema Markup]
  C1 -. cross-link .-> C2
  C2 -. cross-link .-> C3
  C5 -. cross-link .-> C2

Bad vs. expert

The bad approach

The bad architecture flattens everything into the root and hopes for the best:

example.com/
example.com/post-1
example.com/post-2
example.com/cool-thing-i-wrote
example.com/another-post
example.com/page-about-stuff

Or, worse, it nests everything six layers into a CMS-driven taxonomy:

example.com/blog/category/subcategory/year/month/day/post-slug

Both fail Googlebot’s mental model. The flat dump has zero hierarchy signal — every URL claims to be of equal importance. The deep nest puts every content asset behind multiple cluster boundaries that pass diluted equity, and the date-based URL pattern guarantees /2024/ content looks stale next year regardless of whether you updated it.

<!-- Bad: no breadcrumbs, no hierarchy signal -->
<nav>
  <a href="/">Home</a>
  <a href="/blog">Blog</a>
</nav>
<h1>How canonical tags work</h1>

The expert approach

Pillar/cluster URL pattern, breadcrumb component with proper schema:

example.com/
example.com/technical-seo/                       (pillar)
example.com/technical-seo/canonicals/            (cluster)
example.com/technical-seo/robots-txt/            (cluster)
example.com/technical-seo/javascript-seo/        (cluster)
example.com/on-page-seo/                         (pillar)
example.com/on-page-seo/title-tags/              (cluster)

---
// src/components/Breadcrumb.astro
interface Crumb { name: string; url: string; }
const { trail } = Astro.props as { trail: Crumb[] };

const itemListElement = trail.map((c, i) => ({
  "@type": "ListItem",
  position: i + 1,
  name: c.name,
  item: new URL(c.url, Astro.site).toString(),
}));

const jsonLd = {
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  itemListElement,
};
---
<nav aria-label="Breadcrumb">
  <ol class="breadcrumb">
    {trail.map((c, i) => (
      <li>
        {i < trail.length - 1
          ? <a href={c.url}>{c.name}</a>
          : <span aria-current="page">{c.name}</span>}
      </li>
    ))}
  </ol>
</nav>
<script type="application/ld+json" set:html={JSON.stringify(jsonLd)} />

The resulting JSON-LD that ships in <head>:

{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://example.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Technical SEO",
      "item": "https://example.com/technical-seo/"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "Canonical Tags",
      "item": "https://example.com/technical-seo/canonicals/"
    }
  ]
}

This works because (a) the URL string itself is a hierarchy signal, (b) the breadcrumb HTML provides multiple internal links concentrated on the topical parent, and (c) the JSON-LD makes the relationship machine-readable for SERPs and AI engines alike.

Do this today

Run a Screaming Frog SEO Spider crawl and open Site Structure > Crawl Depth. Identify any URL with Sessions > 0 (after connecting GA4) that sits at depth 4 or deeper. These are revenue-earning pages buried too far from the root.
In Ahrefs Site Audit, open the Internal Page Rank report. Sort descending. Confirm your money pages are in the top 5%. If not, your link graph is misaligned with your priorities.
Map your top 5 topical pillars. Each pillar gets one URL like /topic/, ideally a long-form guide. Each cluster page lives at /topic/subtopic/ and links back to the pillar in body text, not just the breadcrumb.
Implement a Breadcrumb component that renders both visible breadcrumb HTML and BreadcrumbList JSON-LD. Test with Google’s Rich Results Test at search.google.com/test/rich-results. Confirm the Breadcrumbs result shows green.
In Google Search Console > Enhancements > Breadcrumbs, verify the report exists and item count grows over the next 14 days. If GSC reports 0 valid items, your schema is malformed.
For each pillar, audit internal links with the Screaming Frog Link Score column. Each cluster should have at least 3 inbound internal links from related cluster pages. Use the Custom Search filter to find clusters with <3 inbound links.
Kill orphan pages. Run Configuration > Spider > Crawl Behaviour > Crawl Linked XML Sitemaps in Screaming Frog. URLs in the sitemap with zero inbound internal links are orphans. Either link to them from the appropriate pillar or remove them from the sitemap.
Add a lateral links module to cluster pages that surfaces the 4–6 most semantically related siblings. Implement similarity via TF-IDF or embeddings, not alphabetical order.
Re-crawl after changes. Confirm crawl depth distribution shifts: target 80% of high-value URLs at depth 2 or 3, with no revenue page deeper than 4.