Measuring AI Visibility
Why GSC can't see AI traffic. Tools (Profound, Athena HQ, Otterly, Peec AI, Ahrefs Brand Radar, Surfer, Semrush) and the manual audit baseline that beats them all when you start.
You can’t optimize what you can’t measure, and Google Search Console cannot see AI traffic. The referrer string from ChatGPT, Perplexity, Claude, and Gemini either isn’t passed at all or is heavily anonymized. Building a measurement stack — both tooled and manual — is the difference between AI visibility being a real KPI and an aspirational vibe. This module covers the tools, the methods, and the baseline you should run before paying anyone.
TL;DR
- GSC, GA4, and Ahrefs (organic-only) are blind to AI citation activity. The major AI engines suppress or anonymize referrers; even when traffic does land, you cannot attribute it to a specific AI engine reliably without a dedicated visibility tool.
- Tooled options exist: Profound, AthenaHQ, Otterly, Peec AI, Ahrefs Brand Radar, Surfer AI Tracker, Semrush AI Visibility Toolkit. Each has a different angle. Pick by the metric you most need (citation share, mention volume, sentiment, query coverage).
- A manual weekly audit of 25 priority queries beats any tool when you’re starting. Spreadsheet, four engines, one hour, weekly cadence. Establish the baseline before you spend on tooling.
The mental model
Measuring AI visibility is like measuring radio play in 1985. Spotify didn’t exist; you couldn’t pull a dashboard. To know how often your song was being played, you either (a) called individual radio stations and asked, (b) hired a service like Mediabase that monitored airplay, or (c) listened to the radio yourself and logged what you heard.
The 2026 AI visibility landscape is the same. You either (a) ask the engines yourself manually, (b) pay a tool that monitors them programmatically, or (c) infer from indirect signals (branded search lift, direct traffic, engagement). The manual approach is unfashionable but it’s the most accurate baseline, especially when starting out.
The tools are not yet at the maturity of GSC. They’re at the maturity of Mediabase circa 1995 — useful, directionally correct, sometimes inconsistent across vendors. Trust them for trends, not for absolute precision.
Deep dive: the 2026 reality
Why GSC can’t see it:
| Engine | Referrer behavior | Trackable in GA4? |
|---|---|---|
| ChatGPT | chat.openai.com referrer ~20% of the time; mostly suppressed | Partial |
| Perplexity | perplexity.ai referrer most of the time | Yes (filterable in GA4) |
| Google AIO | Same google.com referrer as organic | No (folded into organic) |
| Google AI Mode | Some google.com/ai referrers | Partial; new in late 2025 |
| Claude | Direct/no referrer | No |
| Gemini | gemini.google.com sometimes | Partial |
| Bing Copilot | bing.com referrer | Yes |
Even when a referrer makes it through, the more important question — “was I cited in the AI answer?” — has nothing to do with whether the user clicked. Citation share is the upstream metric; click-through is downstream and lossy.
Tool comparison, May 2026:
| Tool | Strength | Pricing tier | Best for |
|---|---|---|---|
| Profound | Multi-engine citation share, prompt sets at scale | $$$$ | Enterprise teams |
| AthenaHQ | Per-engine citation tracking, sentiment | $$$ | Mid-market with deep needs |
| Otterly.ai | Mention monitoring, alerts, lightweight UX | $$ | Mid-market starter |
| Peec AI | Visibility scoring, competitor benchmarking | $$ | Small/mid teams |
| Ahrefs Brand Radar | LLM mention tracking inside Ahrefs ecosystem | Included if Ahrefs | Existing Ahrefs subs |
| Surfer AI Tracker | Citation monitoring at content level | $$ | Content teams |
| Semrush AI Visibility Toolkit | Cross-engine share of voice | Included with Semrush | Existing Semrush subs |
| Manual audit | Free, accurate, low-throughput | Free | Everyone, especially starting out |
Metrics that matter:
- Citation share — % of priority queries where you’re cited inside the AI answer. The headline metric.
- Mention rate — % of queries where your brand is mentioned (with or without citation). Brand-awareness adjacent.
- Position-in-citation — when cited, are you the 1st, 3rd, or 7th source? Earlier positions get more click-through.
- Sentiment — when mentioned, is the model describing you positively or negatively? Especially relevant for reputation queries.
- Engine breakdown — your share by engine. Performance varies dramatically across surfaces.
- Query coverage — % of strategic queries you appear on at all.
- Competitor share-of-voice — your share vs. named competitors across the same prompt set.
The manual audit baseline. Before you buy any tool, run this for 4 weeks:
- List your top 25 priority queries (mix of head, mid, and long-tail).
- Each Monday, run all 25 in ChatGPT (logged out, fresh session), Perplexity, Google AIO, and Claude. Use a fresh browser profile.
- For each query, log: (a) was your brand cited yes/no, (b) at what position, (c) what was said, (d) which competitors were cited.
- Spreadsheet rows are queries; columns are engine + week. Conditional formatting in Google Sheets to spot trends.
This produces 100 data points per week (25 queries × 4 engines), takes ~75 minutes, and gives a more accurate citation-share baseline than any sub-$500/month tool.
Sentiment monitoring. Critical for reputation queries (“is X trustworthy?”, “is X better than Y?”). LLMs sometimes generate adversarial framings of brands, occasionally hallucinated. Tools: AthenaHQ’s sentiment dimension, Otterly’s mention-context capture, or manual spot-checks. Negative sentiment on AI surfaces is the new “negative review you can’t reply to” — you have to address it through content, earned media, and (for severe hallucinations) direct vendor reports.
Visualizing it
flowchart TD
Start[New AI visibility program] --> Manual[Manual audit: 25 queries x 4 engines weekly]
Manual --> Baseline[4-week baseline established]
Baseline --> Decision{Citation share <10%?}
Decision -->|Yes| Improve[Apply GEO principles]
Decision -->|No| Defend[Track for regression]
Improve --> Tool{Worth tooling investment?}
Defend --> Tool
Tool -->|Sub-$1k MRR| Otterly[Otterly / Peec / Surfer]
Tool -->|$1-5k MRR| Athena[AthenaHQ / existing Ahrefs Brand Radar]
Tool -->|$5k+ MRR enterprise| Profound[Profound]
Otterly --> Track[Weekly citation share KPI]
Athena --> Track
Profound --> Track
Track --> Action[Tie content + PR sprints to delta]
Bad vs. expert
The bad approach
Looking at GA4 organic traffic and concluding “AI isn’t a meaningful channel for us.” Or buying a $1k/month tool before knowing what query set actually matters.
Reporting deck:
- "AI traffic is <0.5% of total sessions per GA4."
- "Therefore AI optimization is not a 2026 priority."
- (Unstated: GA4 cannot see most AI citations because they're zero-click.)
This fails because the question wasn’t whether AI sends traffic — it’s whether AI is sending citations that build branded demand and influence buying decisions. Both happen pre-click. By the time it shows up in GA4 organic, the AI conversation has already happened.
The expert approach
A tiered measurement plan: manual baseline first, tool layer added when the data justifies cost.
ai_visibility_measurement_plan:
phase_1_baseline:
duration: 4_weeks
cost: 0
method: |
Manual audit, 25 priority queries, 4 engines, weekly.
Spreadsheet template: query / engine / week / cited / position / sentiment / competitors
deliverable: |
Citation share baseline + competitor map
Identify top 5 queries to win
Identify top 3 competitor watering holes
phase_2_tooling:
trigger: |
Phase 1 baseline complete AND >100 priority queries to track
cost: $$ (Otterly or Peec) or $$$ (AthenaHQ)
method: |
Tool tracks 100-500 queries weekly automatically
Manual audit continues for 25 highest-priority as ground truth check
deliverable: |
Weekly citation-share dashboard
Sentiment alerts on brand-name queries
phase_3_enterprise:
trigger: |
Citation share is a board-level KPI, >500 queries
cost: $$$$ (Profound)
method: |
Continuous monitoring across all engines
Per-prompt-set granularity
Sentiment + competitor benchmarking integrated
deliverable: |
Real-time citation dashboards
Attribution from earned media to citation lift
// Optional: simple GA4 referrer breakout for partial AI traffic
// Apply as a custom segment in GA4 → Explore
const aiReferrerHosts = [
"perplexity.ai",
"chat.openai.com",
"chatgpt.com",
"gemini.google.com",
"bing.com", // Copilot users
"you.com",
"claude.ai"
];
// In GA4, create an Audience with conditions matching these hostnames
// for a directional view of AI-referred traffic. Will undercount.
This wins because it starts with a free, accurate baseline, only adds tooling when there’s a clear justification, and treats GA4 referrer data as a directional supplement, not a primary signal.
Do this today
- Build a Google Sheet titled “AI Visibility Audit” with one tab per week. Columns:
query | engine | cited (Y/N) | position | what they said | competitors cited | sentiment. - List your top 25 priority queries — high commercial intent, head terms, brand+category, top customer questions. Prioritize where you’d most want to be cited.
- Every Monday morning, allocate 75 minutes. Run all 25 queries in ChatGPT (logged out, fresh window), Perplexity (no signed-in profile), Google AIO (clean Chrome profile), Claude with web (new conversation). Log everything in the sheet.
- After 4 weeks, calculate your citation share by engine and your top 3 lossiest queries. Those become the next sprint targets.
- In GA4 → Explore, create an Audience filter on referrer hostnames
perplexity.ai,chatgpt.com,chat.openai.com,gemini.google.com. Track week-over-week as a directional signal. - Try Profound’s free tier (or Otterly.ai’s 14-day trial). Plug in your 25 queries; compare their data to your manual audit. The deltas teach you what each tool sees and misses.
- If you already pay for Ahrefs, enable Brand Radar. If you pay for Semrush, enable AI Visibility Toolkit. Both are included; both are worth ~80% of a dedicated tool’s value.
- Run a sentiment audit: ask each engine “what are people’s complaints about [your brand]” and “is [your brand] trustworthy”. Log responses. If the engines surface false negatives, those are gaps your content and PR needs to address.
- Track branded search lift as an indirect signal in Google Search Console → Performance → Queries, filtered to your brand name. AI mentions drive branded searches downstream.
- Build a monthly AI Visibility report with three numbers: citation share by engine, sentiment summary, and gain/loss vs. last month. Distribute to leadership. Make it the metric that competes with organic positions on the dashboard.
Mark complete
Toggle to remember this module as mastered. Saved to your browser only.
More in this part
Part 9: AI Search Optimization (GEO/AEO)
- 065 The AI Search Landscape: Where Discovery Goes Next 24m
- 066 Google AI Overviews 21m
- 067 Google AI Mode 26m
- 068 ChatGPT Search Optimization 22m
- 069 Perplexity Optimization 24m
- 070 Generative Engine Optimization (GEO) Principles 21m
- 071 Answer Engine Optimization (AEO) 20m
- 072 AI Citation Patterns by Platform 17m
- 073 AI Crawler Management 19m
- 074 Earned Media for AI Visibility 16m
- 075 Measuring AI Visibility You're here 20m
- 076 The Future: Agentic Search & AI Browsers 22m