Module 075 Advanced 20 min read

Measuring AI Visibility

Why GSC can't see AI traffic. Tools (Profound, Athena HQ, Otterly, Peec AI, Ahrefs Brand Radar, Surfer, Semrush) and the manual audit baseline that beats them all when you start.

By SEO Mastery Editorial

You can’t optimize what you can’t measure, and Google Search Console cannot see AI traffic. The referrer string from ChatGPT, Perplexity, Claude, and Gemini either isn’t passed at all or is heavily anonymized. Building a measurement stack — both tooled and manual — is the difference between AI visibility being a real KPI and an aspirational vibe. This module covers the tools, the methods, and the baseline you should run before paying anyone.

TL;DR

  • GSC, GA4, and Ahrefs (organic-only) are blind to AI citation activity. The major AI engines suppress or anonymize referrers; even when traffic does land, you cannot attribute it to a specific AI engine reliably without a dedicated visibility tool.
  • Tooled options exist: Profound, AthenaHQ, Otterly, Peec AI, Ahrefs Brand Radar, Surfer AI Tracker, Semrush AI Visibility Toolkit. Each has a different angle. Pick by the metric you most need (citation share, mention volume, sentiment, query coverage).
  • A manual weekly audit of 25 priority queries beats any tool when you’re starting. Spreadsheet, four engines, one hour, weekly cadence. Establish the baseline before you spend on tooling.

The mental model

Measuring AI visibility is like measuring radio play in 1985. Spotify didn’t exist; you couldn’t pull a dashboard. To know how often your song was being played, you either (a) called individual radio stations and asked, (b) hired a service like Mediabase that monitored airplay, or (c) listened to the radio yourself and logged what you heard.

The 2026 AI visibility landscape is the same. You either (a) ask the engines yourself manually, (b) pay a tool that monitors them programmatically, or (c) infer from indirect signals (branded search lift, direct traffic, engagement). The manual approach is unfashionable but it’s the most accurate baseline, especially when starting out.

The tools are not yet at the maturity of GSC. They’re at the maturity of Mediabase circa 1995 — useful, directionally correct, sometimes inconsistent across vendors. Trust them for trends, not for absolute precision.

Deep dive: the 2026 reality

Why GSC can’t see it:

EngineReferrer behaviorTrackable in GA4?
ChatGPTchat.openai.com referrer ~20% of the time; mostly suppressedPartial
Perplexityperplexity.ai referrer most of the timeYes (filterable in GA4)
Google AIOSame google.com referrer as organicNo (folded into organic)
Google AI ModeSome google.com/ai referrersPartial; new in late 2025
ClaudeDirect/no referrerNo
Geminigemini.google.com sometimesPartial
Bing Copilotbing.com referrerYes

Even when a referrer makes it through, the more important question — “was I cited in the AI answer?” — has nothing to do with whether the user clicked. Citation share is the upstream metric; click-through is downstream and lossy.

Tool comparison, May 2026:

ToolStrengthPricing tierBest for
ProfoundMulti-engine citation share, prompt sets at scale$$$$Enterprise teams
AthenaHQPer-engine citation tracking, sentiment$$$Mid-market with deep needs
Otterly.aiMention monitoring, alerts, lightweight UX$$Mid-market starter
Peec AIVisibility scoring, competitor benchmarking$$Small/mid teams
Ahrefs Brand RadarLLM mention tracking inside Ahrefs ecosystemIncluded if AhrefsExisting Ahrefs subs
Surfer AI TrackerCitation monitoring at content level$$Content teams
Semrush AI Visibility ToolkitCross-engine share of voiceIncluded with SemrushExisting Semrush subs
Manual auditFree, accurate, low-throughputFreeEveryone, especially starting out

Metrics that matter:

  • Citation share — % of priority queries where you’re cited inside the AI answer. The headline metric.
  • Mention rate — % of queries where your brand is mentioned (with or without citation). Brand-awareness adjacent.
  • Position-in-citation — when cited, are you the 1st, 3rd, or 7th source? Earlier positions get more click-through.
  • Sentiment — when mentioned, is the model describing you positively or negatively? Especially relevant for reputation queries.
  • Engine breakdown — your share by engine. Performance varies dramatically across surfaces.
  • Query coverage — % of strategic queries you appear on at all.
  • Competitor share-of-voice — your share vs. named competitors across the same prompt set.

The manual audit baseline. Before you buy any tool, run this for 4 weeks:

  1. List your top 25 priority queries (mix of head, mid, and long-tail).
  2. Each Monday, run all 25 in ChatGPT (logged out, fresh session), Perplexity, Google AIO, and Claude. Use a fresh browser profile.
  3. For each query, log: (a) was your brand cited yes/no, (b) at what position, (c) what was said, (d) which competitors were cited.
  4. Spreadsheet rows are queries; columns are engine + week. Conditional formatting in Google Sheets to spot trends.

This produces 100 data points per week (25 queries × 4 engines), takes ~75 minutes, and gives a more accurate citation-share baseline than any sub-$500/month tool.

Sentiment monitoring. Critical for reputation queries (“is X trustworthy?”, “is X better than Y?”). LLMs sometimes generate adversarial framings of brands, occasionally hallucinated. Tools: AthenaHQ’s sentiment dimension, Otterly’s mention-context capture, or manual spot-checks. Negative sentiment on AI surfaces is the new “negative review you can’t reply to” — you have to address it through content, earned media, and (for severe hallucinations) direct vendor reports.

Visualizing it

flowchart TD
  Start[New AI visibility program] --> Manual[Manual audit: 25 queries x 4 engines weekly]
  Manual --> Baseline[4-week baseline established]
  Baseline --> Decision{Citation share <10%?}
  Decision -->|Yes| Improve[Apply GEO principles]
  Decision -->|No| Defend[Track for regression]
  Improve --> Tool{Worth tooling investment?}
  Defend --> Tool
  Tool -->|Sub-$1k MRR| Otterly[Otterly / Peec / Surfer]
  Tool -->|$1-5k MRR| Athena[AthenaHQ / existing Ahrefs Brand Radar]
  Tool -->|$5k+ MRR enterprise| Profound[Profound]
  Otterly --> Track[Weekly citation share KPI]
  Athena --> Track
  Profound --> Track
  Track --> Action[Tie content + PR sprints to delta]

Bad vs. expert

The bad approach

Looking at GA4 organic traffic and concluding “AI isn’t a meaningful channel for us.” Or buying a $1k/month tool before knowing what query set actually matters.

Reporting deck:
- "AI traffic is <0.5% of total sessions per GA4."
- "Therefore AI optimization is not a 2026 priority."
- (Unstated: GA4 cannot see most AI citations because they're zero-click.)

This fails because the question wasn’t whether AI sends traffic — it’s whether AI is sending citations that build branded demand and influence buying decisions. Both happen pre-click. By the time it shows up in GA4 organic, the AI conversation has already happened.

The expert approach

A tiered measurement plan: manual baseline first, tool layer added when the data justifies cost.

ai_visibility_measurement_plan:
  phase_1_baseline:
    duration: 4_weeks
    cost: 0
    method: |
      Manual audit, 25 priority queries, 4 engines, weekly.
      Spreadsheet template: query / engine / week / cited / position / sentiment / competitors
    deliverable: |
      Citation share baseline + competitor map
      Identify top 5 queries to win
      Identify top 3 competitor watering holes

  phase_2_tooling:
    trigger: |
      Phase 1 baseline complete AND >100 priority queries to track
    cost: $$ (Otterly or Peec) or $$$ (AthenaHQ)
    method: |
      Tool tracks 100-500 queries weekly automatically
      Manual audit continues for 25 highest-priority as ground truth check
    deliverable: |
      Weekly citation-share dashboard
      Sentiment alerts on brand-name queries

  phase_3_enterprise:
    trigger: |
      Citation share is a board-level KPI, >500 queries
    cost: $$$$ (Profound)
    method: |
      Continuous monitoring across all engines
      Per-prompt-set granularity
      Sentiment + competitor benchmarking integrated
    deliverable: |
      Real-time citation dashboards
      Attribution from earned media to citation lift
// Optional: simple GA4 referrer breakout for partial AI traffic
// Apply as a custom segment in GA4 → Explore
const aiReferrerHosts = [
  "perplexity.ai",
  "chat.openai.com",
  "chatgpt.com",
  "gemini.google.com",
  "bing.com",         // Copilot users
  "you.com",
  "claude.ai"
];

// In GA4, create an Audience with conditions matching these hostnames
// for a directional view of AI-referred traffic. Will undercount.

This wins because it starts with a free, accurate baseline, only adds tooling when there’s a clear justification, and treats GA4 referrer data as a directional supplement, not a primary signal.

Do this today

  1. Build a Google Sheet titled “AI Visibility Audit” with one tab per week. Columns: query | engine | cited (Y/N) | position | what they said | competitors cited | sentiment.
  2. List your top 25 priority queries — high commercial intent, head terms, brand+category, top customer questions. Prioritize where you’d most want to be cited.
  3. Every Monday morning, allocate 75 minutes. Run all 25 queries in ChatGPT (logged out, fresh window), Perplexity (no signed-in profile), Google AIO (clean Chrome profile), Claude with web (new conversation). Log everything in the sheet.
  4. After 4 weeks, calculate your citation share by engine and your top 3 lossiest queries. Those become the next sprint targets.
  5. In GA4 → Explore, create an Audience filter on referrer hostnames perplexity.ai, chatgpt.com, chat.openai.com, gemini.google.com. Track week-over-week as a directional signal.
  6. Try Profound’s free tier (or Otterly.ai’s 14-day trial). Plug in your 25 queries; compare their data to your manual audit. The deltas teach you what each tool sees and misses.
  7. If you already pay for Ahrefs, enable Brand Radar. If you pay for Semrush, enable AI Visibility Toolkit. Both are included; both are worth ~80% of a dedicated tool’s value.
  8. Run a sentiment audit: ask each engine “what are people’s complaints about [your brand]” and “is [your brand] trustworthy”. Log responses. If the engines surface false negatives, those are gaps your content and PR needs to address.
  9. Track branded search lift as an indirect signal in Google Search Console → Performance → Queries, filtered to your brand name. AI mentions drive branded searches downstream.
  10. Build a monthly AI Visibility report with three numbers: citation share by engine, sentiment summary, and gain/loss vs. last month. Distribute to leadership. Make it the metric that competes with organic positions on the dashboard.

Mark complete

Toggle to remember this module as mastered. Saved to your browser only.

More in this part

Part 9: AI Search Optimization (GEO/AEO)

View all on the home page →
  1. 065 The AI Search Landscape: Where Discovery Goes Next 24m
  2. 066 Google AI Overviews 21m
  3. 067 Google AI Mode 26m
  4. 068 ChatGPT Search Optimization 22m
  5. 069 Perplexity Optimization 24m
  6. 070 Generative Engine Optimization (GEO) Principles 21m
  7. 071 Answer Engine Optimization (AEO) 20m
  8. 072 AI Citation Patterns by Platform 17m
  9. 073 AI Crawler Management 19m
  10. 074 Earned Media for AI Visibility 16m
  11. 075 Measuring AI Visibility You're here 20m
  12. 076 The Future: Agentic Search & AI Browsers 22m