Structured Data & Schema Markup

Schema.org markup is no longer optional. In 2026 it does two jobs: it earns rich results in Google SERPs, and it tells AI Overviews, Gemini, ChatGPT Search, and Perplexity which entities your page is about without forcing them to infer it from prose. Pages with clean structured data get cited 2–3x more often in AI grounding passes, per Brightedge’s 2025 AI citation study. This module is the JSON-LD-only playbook because Microdata and RDFa are now legacy formats Google still parses but no longer recommends.

TL;DR

JSON-LD is the only format you should write today. Microdata and RDFa are still parsed; Google’s documentation has recommended JSON-LD exclusively since 2017.
Required vs recommended is not optional. Every rich-result type has Google-specific required properties; missing one removes the page from rich-result eligibility.
Schema is entity glue for AI. @id, sameAs, and consistent Organization markup tell LLMs and Knowledge Graph what your business is, separate from what your content says.

The mental model

Schema markup is like the metadata sticker on a museum exhibit. The exhibit (your page content) is the thing visitors see; the sticker tells the curator (Google, AI crawlers) the artist, the year, the medium, the provenance — facts that help them file the work correctly and put it next to relevant exhibits. Without the sticker, the curator has to guess from the painting itself, which they can do, but slowly and with mistakes.

The 2026 version of this analogy: AI Overviews and AI Mode are increasingly the “curator” assembling answers from many exhibits. They prefer pages where the metadata is unambiguous because they don’t have time to read every exhibit in full. A page with Article schema, an author linked to a Person, and mainEntityOfPage pointing at the canonical URL is one the curator can quote with confidence. A page with no schema is still readable, but the curator has to reverse-engineer who wrote it and what it claims — and may quote a clearer competitor instead.

The point is not the rich result. It is entity disambiguation. Schema is how you tell the web “this page is about that exact thing, not the thing with the same name.”

Deep dive: the 2026 reality

The schema types that earn Google rich results in 2026, ranked by frequency and impact:

Type	Earns rich result	High-value verticals	Notes
Organization	Knowledge panel	All sites	Required for brand entity in Knowledge Graph
Person	Knowledge panel	Authors, executives	Pair with `sameAs` to LinkedIn, Wikidata
Article / NewsArticle	Top Stories, AI Overview citation	News, blog	`headline`, `datePublished`, `author` required
FAQPage	Reduced visibility since Aug 2023	Most sites	Now limited to authoritative health/government sites
HowTo	Removed from rich results Sept 2023	Tutorials	Still useful for AI grounding, no SERP visual
Product	Product snippets, Merchant listings	E-commerce	`Offer`, `AggregateRating`, `Review`
Review / AggregateRating	Review stars	Reviews, products	Must come from first-party reviewers
LocalBusiness	Local pack, knowledge panel	Local services	`address`, `geo`, `openingHours`
BreadcrumbList	SERP breadcrumb display	Hierarchical sites	Use real path, not category-only
VideoObject	Video snippet, key moments	Video sites	`contentUrl` and `uploadDate` required
Event	Event listings	Tickets, venues	`startDate`, `location`, `offers`
Recipe	Recipe carousel	Food sites	`recipeIngredient`, `recipeInstructions`
JobPosting	Google for Jobs	Career sites	Strict freshness rules; remove expired postings
Course	Course listings (limited)	Education	Often paired with `LearningResource`
SoftwareApplication	App snippet	SaaS, apps	`applicationCategory`, `aggregateRating`
Speakable	Voice search summary	News	`cssSelector` to read-aloud blocks

FAQ and HowTo were quietly demoted in 2023. Google’s August 8, 2023 update restricted FAQ rich results to “well-known authoritative” sites only — most marketing sites lost the visual treatment overnight. HowTo rich results were removed entirely in September 2023. Both schemas are still worth implementing because AI Overviews and AI Mode use them for grounding, but if you were maintaining FAQPage purely for the SERP stars, that ship sailed.

The @id pattern is what makes schema interoperable across pages. A page-level Article with author should reference the author’s Person node by @id, not redeclare every author property. Done right, your site’s schema becomes a graph: Organization → Articles → Authors → Reviewers, each entity declared once and referenced everywhere.

{
  "@context": "https://schema.org",
  "@graph": [
    { "@type": "Organization", "@id": "https://example.com/#org", "name": "Example" },
    { "@type": "Person", "@id": "https://example.com/team/jane#person", "name": "Jane Smith" },
    { "@type": "Article",
      "headline": "...",
      "author": { "@id": "https://example.com/team/jane#person" },
      "publisher": { "@id": "https://example.com/#org" }
    }
  ]
}

Speakable is the schema type Google uses for voice search and now Gemini Live audio responses. It marks specific sections of a page as suitable for read-aloud — typically a 20-30 word summary. Currently English-only and limited to news publishers, but the markup is harmless to add elsewhere.

Knowledge Graph entity building is where structured data overlaps with brand SEO. Your Organization schema, replicated identically across every page, paired with sameAs to high-trust references (Wikidata, LinkedIn, Crunchbase, Bloomberg), is what Google uses to construct the Knowledge Graph entity behind your brand. AI Overviews now cite the entity, not the page; if Google has no entity for your brand, you cannot be cited.

Visualizing it

flowchart TD
  A[Your HTML page] --> B[JSON-LD in head or body]
  B --> C[Google parser extracts entities]
  C --> D{Required properties present?}
  D -->|No| E[Rich result ineligible, schema ignored]
  D -->|Yes| F[Eligible for rich result]
  F --> G[Knowledge Graph link via sameAs and id]
  G --> H[AI Overviews and AI Mode use entity to ground answer]
  C --> I[Bing, ChatGPT Search via Bing index]
  I --> J[ChatGPT cites page]
  C --> K[Perplexity own crawler + Brave]
  K --> L[Perplexity cites page]

Bad vs. expert

The bad approach

<!-- Stale Microdata with missing required fields, mixed with old hreviews -->
<div itemscope itemtype="http://schema.org/Product">
  <h1 itemprop="name">Acme Widget</h1>
  <span itemprop="description">A great widget.</span>
  <div itemprop="aggregateRating" itemscope itemtype="http://schema.org/AggregateRating">
    <span itemprop="ratingValue">4.9</span>
  </div>
</div>

<!-- Separate FAQ Microdata block on a sales page that has no real FAQ -->
<div itemscope itemtype="http://schema.org/FAQPage">
  <div itemscope itemprop="mainEntity" itemtype="http://schema.org/Question">
    <h3 itemprop="name">Why choose Acme?</h3>
    <div itemscope itemprop="acceptedAnswer" itemtype="http://schema.org/Answer">
      <span itemprop="text">Because we are the best!</span>
    </div>
  </div>
</div>

The Product is missing image, offers, and reviewCount — Google ignores AggregateRating when reviewCount is absent. The description is two words and adds no context. The HTTP schema.org URL still works but signals stale code. Worst of all, the FAQPage is fabricated: the questions and answers don’t appear visibly on the page, which is now a manual-action trigger (“Spammy structured markup”). Sites flagged for fake FAQ markup typically lose all rich-result eligibility for 90 days minimum.

The expert approach

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Organization",
      "@id": "https://acme.com/#organization",
      "name": "Acme Widgets",
      "url": "https://acme.com/",
      "logo": {
        "@type": "ImageObject",
        "url": "https://acme.com/logo.png",
        "width": 600,
        "height": 60
      },
      "sameAs": [
        "https://www.wikidata.org/wiki/Q12345678",
        "https://www.linkedin.com/company/acme-widgets",
        "https://en.wikipedia.org/wiki/Acme_Widgets"
      ]
    },
    {
      "@type": "WebPage",
      "@id": "https://acme.com/widgets/x100/#webpage",
      "url": "https://acme.com/widgets/x100/",
      "isPartOf": { "@id": "https://acme.com/#website" },
      "breadcrumb": { "@id": "https://acme.com/widgets/x100/#breadcrumbs" }
    },
    {
      "@type": "BreadcrumbList",
      "@id": "https://acme.com/widgets/x100/#breadcrumbs",
      "itemListElement": [
        { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://acme.com/" },
        { "@type": "ListItem", "position": 2, "name": "Widgets", "item": "https://acme.com/widgets/" },
        { "@type": "ListItem", "position": 3, "name": "X100" }
      ]
    },
    {
      "@type": "Product",
      "@id": "https://acme.com/widgets/x100/#product",
      "name": "Acme X100 Widget",
      "description": "Hand-machined brass widget rated for 50,000 cycles. Built in Akron, Ohio since 1962.",
      "sku": "ACM-X100",
      "gtin13": "0123456789012",
      "brand": { "@id": "https://acme.com/#organization" },
      "image": [
        "https://acme.com/widgets/x100/hero.avif",
        "https://acme.com/widgets/x100/detail.avif"
      ],
      "offers": {
        "@type": "Offer",
        "url": "https://acme.com/widgets/x100/",
        "priceCurrency": "USD",
        "price": "129.00",
        "priceValidUntil": "2026-12-31",
        "availability": "https://schema.org/InStock",
        "itemCondition": "https://schema.org/NewCondition",
        "seller": { "@id": "https://acme.com/#organization" }
      },
      "aggregateRating": {
        "@type": "AggregateRating",
        "ratingValue": "4.8",
        "reviewCount": "247",
        "bestRating": "5",
        "worstRating": "1"
      }
    }
  ]
}
</script>

Every entity has an @id so the graph is internally consistent. Organization is declared once and referenced as brand and seller. BreadcrumbList matches the visible breadcrumb. Product has all six properties Google requires for Merchant listings (name, image, description, sku, offers, aggregateRating). gtin13 is the universal product identifier that lets Google merge your listing with retailer feeds. priceValidUntil prevents stale-price warnings in Search Console. The whole block validates against both Schema.org and the Rich Results Test.

Do this today

Open the Rich Results Test at search.google.com/test/rich-results. Paste your homepage URL and any product, article, or local-business URL. Note every rich-result type listed and any error/warning.
Open the Schema.org Validator at validator.schema.org. This catches Schema.org-level errors that Google’s tester ignores (e.g., orphaned @id, malformed @graph).
In Google Search Console → Enhancements, review every report (Products, Sitelinks Searchbox, Breadcrumbs, FAQ, HowTo, Logos, Articles, Videos, Events, JobPosting, etc.). Each “Invalid items” row is a remediation ticket.
Build your site’s Organization JSON-LD once in a partial template and inject it on every page. Include name, url, logo, sameAs (Wikidata, LinkedIn, Crunchbase), and a stable @id.
Add BreadcrumbList to every non-homepage URL using the actual visible breadcrumb path. Self-referencing canonicals must match the last item’s URL.
For e-commerce: ensure every product page has Product with name, image, description, sku, gtin13 (or mpn), brand, offers, and aggregateRating if you have first-party reviews. Set priceValidUntil to the end of the year minimum.
For content: add Article or NewsArticle with headline (≤110 chars), datePublished, dateModified, author referencing a Person @id, and publisher referencing the Organization @id.
Audit existing FAQPage and HowTo schema. Remove fabricated FAQ blocks immediately. Keep real FAQ markup even though SERP visibility is gone — AI Overviews and Gemini still consume it.
Validate Person entries with sameAs to LinkedIn and (where applicable) Wikidata. The Person @id should be a URL that 200s — the author bio page works perfectly.
Set up a schema regression test in CI. Use @google/schemarketing or a Playwright script that hits the Rich Results Test API for your top 20 templates on every deploy and fails the build on new errors.