Measuring AI Visibility

You can’t optimize what you can’t measure, and Google Search Console cannot see AI traffic. The referrer string from ChatGPT, Perplexity, Claude, and Gemini either isn’t passed at all or is heavily anonymized. Building a measurement stack — both tooled and manual — is the difference between AI visibility being a real KPI and an aspirational vibe. This module covers the tools, the methods, and the baseline you should run before paying anyone.

TL;DR

GSC, GA4, and Ahrefs (organic-only) are blind to AI citation activity. The major AI engines suppress or anonymize referrers; even when traffic does land, you cannot attribute it to a specific AI engine reliably without a dedicated visibility tool.
Tooled options exist: Profound, AthenaHQ, Otterly, Peec AI, Ahrefs Brand Radar, Surfer AI Tracker, Semrush AI Visibility Toolkit. Each has a different angle. Pick by the metric you most need (citation share, mention volume, sentiment, query coverage).
A manual weekly audit of 25 priority queries beats any tool when you’re starting. Spreadsheet, four engines, one hour, weekly cadence. Establish the baseline before you spend on tooling.

The mental model

Measuring AI visibility is like measuring radio play in 1985. Spotify didn’t exist; you couldn’t pull a dashboard. To know how often your song was being played, you either (a) called individual radio stations and asked, (b) hired a service like Mediabase that monitored airplay, or (c) listened to the radio yourself and logged what you heard.

The 2026 AI visibility landscape is the same. You either (a) ask the engines yourself manually, (b) pay a tool that monitors them programmatically, or (c) infer from indirect signals (branded search lift, direct traffic, engagement). The manual approach is unfashionable but it’s the most accurate baseline, especially when starting out.

The tools are not yet at the maturity of GSC. They’re at the maturity of Mediabase circa 1995 — useful, directionally correct, sometimes inconsistent across vendors. Trust them for trends, not for absolute precision.

Deep dive: the 2026 reality

Why GSC can’t see it:

Engine	Referrer behavior	Trackable in GA4?
ChatGPT	`chat.openai.com` referrer ~20% of the time; mostly suppressed	Partial
Perplexity	`perplexity.ai` referrer most of the time	Yes (filterable in GA4)
Google AIO	Same `google.com` referrer as organic	No (folded into organic)
Google AI Mode	Some `google.com/ai` referrers	Partial; new in late 2025
Claude	Direct/no referrer	No
Gemini	`gemini.google.com` sometimes	Partial
Bing Copilot	`bing.com` referrer	Yes

Even when a referrer makes it through, the more important question — “was I cited in the AI answer?” — has nothing to do with whether the user clicked. Citation share is the upstream metric; click-through is downstream and lossy.

Tool comparison, May 2026:

Tool	Strength	Pricing tier	Best for
Profound	Multi-engine citation share, prompt sets at scale	$$$$	Enterprise teams
AthenaHQ	Per-engine citation tracking, sentiment	$$$	Mid-market with deep needs
Otterly.ai	Mention monitoring, alerts, lightweight UX	$$	Mid-market starter
Peec AI	Visibility scoring, competitor benchmarking	$$	Small/mid teams
Ahrefs Brand Radar	LLM mention tracking inside Ahrefs ecosystem	Included if Ahrefs	Existing Ahrefs subs
Surfer AI Tracker	Citation monitoring at content level	$$	Content teams
Semrush AI Visibility Toolkit	Cross-engine share of voice	Included with Semrush	Existing Semrush subs
Manual audit	Free, accurate, low-throughput	Free	Everyone, especially starting out

Metrics that matter:

Citation share — % of priority queries where you’re cited inside the AI answer. The headline metric.
Mention rate — % of queries where your brand is mentioned (with or without citation). Brand-awareness adjacent.
Position-in-citation — when cited, are you the 1st, 3rd, or 7th source? Earlier positions get more click-through.
Sentiment — when mentioned, is the model describing you positively or negatively? Especially relevant for reputation queries.
Engine breakdown — your share by engine. Performance varies dramatically across surfaces.
Query coverage — % of strategic queries you appear on at all.
Competitor share-of-voice — your share vs. named competitors across the same prompt set.

The manual audit baseline. Before you buy any tool, run this for 4 weeks:

List your top 25 priority queries (mix of head, mid, and long-tail).
Each Monday, run all 25 in ChatGPT (logged out, fresh session), Perplexity, Google AIO, and Claude. Use a fresh browser profile.
For each query, log: (a) was your brand cited yes/no, (b) at what position, (c) what was said, (d) which competitors were cited.
Spreadsheet rows are queries; columns are engine + week. Conditional formatting in Google Sheets to spot trends.

This produces 100 data points per week (25 queries × 4 engines), takes ~75 minutes, and gives a more accurate citation-share baseline than any sub-$500/month tool.

Sentiment monitoring. Critical for reputation queries (“is X trustworthy?”, “is X better than Y?”). LLMs sometimes generate adversarial framings of brands, occasionally hallucinated. Tools: AthenaHQ’s sentiment dimension, Otterly’s mention-context capture, or manual spot-checks. Negative sentiment on AI surfaces is the new “negative review you can’t reply to” — you have to address it through content, earned media, and (for severe hallucinations) direct vendor reports.

Visualizing it

flowchart TD
  Start[New AI visibility program] --> Manual[Manual audit: 25 queries x 4 engines weekly]
  Manual --> Baseline[4-week baseline established]
  Baseline --> Decision{Citation share <10%?}
  Decision -->|Yes| Improve[Apply GEO principles]
  Decision -->|No| Defend[Track for regression]
  Improve --> Tool{Worth tooling investment?}
  Defend --> Tool
  Tool -->|Sub-$1k MRR| Otterly[Otterly / Peec / Surfer]
  Tool -->|$1-5k MRR| Athena[AthenaHQ / existing Ahrefs Brand Radar]
  Tool -->|$5k+ MRR enterprise| Profound[Profound]
  Otterly --> Track[Weekly citation share KPI]
  Athena --> Track
  Profound --> Track
  Track --> Action[Tie content + PR sprints to delta]

Bad vs. expert

The bad approach

Looking at GA4 organic traffic and concluding “AI isn’t a meaningful channel for us.” Or buying a $1k/month tool before knowing what query set actually matters.

Reporting deck:
- "AI traffic is <0.5% of total sessions per GA4."
- "Therefore AI optimization is not a 2026 priority."
- (Unstated: GA4 cannot see most AI citations because they're zero-click.)

This fails because the question wasn’t whether AI sends traffic — it’s whether AI is sending citations that build branded demand and influence buying decisions. Both happen pre-click. By the time it shows up in GA4 organic, the AI conversation has already happened.

The expert approach

A tiered measurement plan: manual baseline first, tool layer added when the data justifies cost.

ai_visibility_measurement_plan:
  phase_1_baseline:
    duration: 4_weeks
    cost: 0
    method: |
      Manual audit, 25 priority queries, 4 engines, weekly.
      Spreadsheet template: query / engine / week / cited / position / sentiment / competitors
    deliverable: |
      Citation share baseline + competitor map
      Identify top 5 queries to win
      Identify top 3 competitor watering holes

  phase_2_tooling:
    trigger: |
      Phase 1 baseline complete AND >100 priority queries to track
    cost: $$ (Otterly or Peec) or $$$ (AthenaHQ)
    method: |
      Tool tracks 100-500 queries weekly automatically
      Manual audit continues for 25 highest-priority as ground truth check
    deliverable: |
      Weekly citation-share dashboard
      Sentiment alerts on brand-name queries

  phase_3_enterprise:
    trigger: |
      Citation share is a board-level KPI, >500 queries
    cost: $$$$ (Profound)
    method: |
      Continuous monitoring across all engines
      Per-prompt-set granularity
      Sentiment + competitor benchmarking integrated
    deliverable: |
      Real-time citation dashboards
      Attribution from earned media to citation lift

// Optional: simple GA4 referrer breakout for partial AI traffic
// Apply as a custom segment in GA4 → Explore
const aiReferrerHosts = [
  "perplexity.ai",
  "chat.openai.com",
  "chatgpt.com",
  "gemini.google.com",
  "bing.com",         // Copilot users
  "you.com",
  "claude.ai"
];

// In GA4, create an Audience with conditions matching these hostnames
// for a directional view of AI-referred traffic. Will undercount.

This wins because it starts with a free, accurate baseline, only adds tooling when there’s a clear justification, and treats GA4 referrer data as a directional supplement, not a primary signal.

Do this today

Build a Google Sheet titled “AI Visibility Audit” with one tab per week. Columns: query | engine | cited (Y/N) | position | what they said | competitors cited | sentiment.
List your top 25 priority queries — high commercial intent, head terms, brand+category, top customer questions. Prioritize where you’d most want to be cited.
Every Monday morning, allocate 75 minutes. Run all 25 queries in ChatGPT (logged out, fresh window), Perplexity (no signed-in profile), Google AIO (clean Chrome profile), Claude with web (new conversation). Log everything in the sheet.
After 4 weeks, calculate your citation share by engine and your top 3 lossiest queries. Those become the next sprint targets.
In GA4 → Explore, create an Audience filter on referrer hostnames perplexity.ai, chatgpt.com, chat.openai.com, gemini.google.com. Track week-over-week as a directional signal.
Try Profound’s free tier (or Otterly.ai’s 14-day trial). Plug in your 25 queries; compare their data to your manual audit. The deltas teach you what each tool sees and misses.
If you already pay for Ahrefs, enable Brand Radar. If you pay for Semrush, enable AI Visibility Toolkit. Both are included; both are worth ~80% of a dedicated tool’s value.
Run a sentiment audit: ask each engine “what are people’s complaints about [your brand]” and “is [your brand] trustworthy”. Log responses. If the engines surface false negatives, those are gaps your content and PR needs to address.
Track branded search lift as an indirect signal in Google Search Console → Performance → Queries, filtered to your brand name. AI mentions drive branded searches downstream.
Build a monthly AI Visibility report with three numbers: citation share by engine, sentiment summary, and gain/loss vs. last month. Distribute to leadership. Make it the metric that competes with organic positions on the dashboard.