Module 010 Intermediate 12 min read

Keyword Research Methods

GSC mining, autosuggest, PAA, Reddit, Quora, YouTube, competitor sitemaps, AI ideation. Ahrefs, Semrush, Moz, Ubersuggest, Keyword Planner, AnswerThePublic, AlsoAsked.

By SEO Mastery Editorial

The best keyword research uses a dozen sources at once, because no single tool sees the full demand surface. Ahrefs misses the new queries that have not crossed its volume threshold. Google Search Console sees only what you already rank for. Reddit and Quora show queries phrased the way humans actually type. YouTube and TikTok show the queries Gen Z searches outside Google. The expert workflow is parallel — pull from every source, dedupe, and let the volume of evidence reveal the real opportunities.

TL;DR

  • No single source sees the full demand surface. Ahrefs and Semrush index queries with detectable volume; GSC sees only your impressions; Reddit/Quora see phrasing; YouTube/TikTok show non-Google demand. Use 6+ sources in parallel.
  • GSC mining is the single highest-fit signal you have. Queries you already get impressions for are pre-validated against your domain authority. Mine them before you touch any third-party tool.
  • AI-assisted ideation works in 2026 — bounded. Use ChatGPT, Claude, or Perplexity to generate query patterns, not query lists. Let the AI produce templates (“[X] in [Y]”, “best [X] for [persona]”), then validate volume and intent in a real keyword tool.

The mental model

Keyword research methods are like a fishing operation, not an aquarium visit. The aquarium (Ahrefs, Semrush) shows you the fish that have already been catalogued. The open ocean (Reddit, autosuggest, GSC, YouTube) is where the new fish are — including the ones with high commercial value that no competitor has noticed yet.

Each method has a different yield. Autosuggest harvests emerging queries (Google completes them because they are trending). PAA harvests adjacent intent (Google literally tells you what people ask next). Reddit harvests language patterns (the exact phrasing your buyer types). Competitor sitemaps harvest proven content opportunities (someone validated the demand and you can do it better).

The point of running multiple methods is not redundancy. It is triangulation. A keyword that appears in your GSC, in Ahrefs at decent volume, in autosuggest, and in a Reddit thread has overwhelming evidence of real demand. A keyword that appears only in Ahrefs at 8K volume but is missing from every other source is probably an estimation error.

Deep dive: the 2026 reality

The 12 methods that matter, in priority order:

1. Google Search Console mining. GSC > Performance > Queries shows every query that produced an impression. Filter to positions 4-20 — these are the queries you almost rank for and can win with on-page work. This is the single most actionable list you will ever build.

2. Autosuggest scraping. Type a seed phrase + space into Google. The 10 suggestions are queries Google has high confidence in. Keyword Tool (keywordtool.io), Ahrefs Keyword Generator, and Soovle scrape autosuggest at scale. Free seed → 100+ candidates in seconds.

3. People Also Ask (PAA) extraction. Each query’s PAA box has 4-8 questions; clicking one expands more. Scrape with AlsoAsked (alsoasked.com) to get the tree of related questions. Particularly useful for AEO content design.

4. Reddit and Quora deep mining. Reddit’s /r/[your-vertical] shows the exact phrasing your buyers use. Top-voted posts that are questions are validated demand. GummySearch, Subreddit Stats, or manual sorting by “Top: All Time” filtered to “Question” posts.

5. YouTube suggest. Type seed in YouTube search. Suggestions are video-intent queries. Critical for visual or how-to content. Use TubeBuddy or vidIQ for volume estimates.

6. Competitor sitemaps. Visit https://competitor.com/sitemap.xml. Their URLs are their published keyword strategy. Ahrefs Site Explorer > Top Pages shows which competitor URLs drive traffic and what queries earn it.

7. Ahrefs / Semrush bulk pulls. The third-party tools fill the gap of “keywords I do not yet rank for and have not thought of.” Ahrefs Keywords Explorer, Semrush Keyword Magic Tool, Moz Keyword Explorer, and Ubersuggest (cheapest tier) all do similar work.

8. Google Keyword Planner. Free, requires a Google Ads account. Volume buckets are coarse but the source-of-truth integration with Ads makes it useful for paid-side calibration.

9. AnswerThePublic. Visualizes question variants of a seed keyword. Free with rate limits, paid for unlimited. Quick way to find micro-intents.

10. Surfer Keyword Surfer (browser extension). Shows volume estimates in the SERP itself. Useful during competitive research browsing.

11. Bing / DuckDuckGo / Brave autosuggest. Different index, different suggestions. Brave is the index Claude with web uses, so unique Brave suggestions matter for AISO.

12. AI-assisted ideation. ChatGPT, Claude, Perplexity. Use them to generate query templates and persona-modifier combinations, then validate in real tools.

MethodCostYieldBest for
GSC miningFreeHigh-fit, low-volumePositions 4-20 quick wins
AutosuggestFree / cheapBroad seedsInitial expansion
PAA / AlsoAskedFree / $$Question variantsAEO content
Reddit / QuoraFreeBuyer languagePain-point content
YouTube suggestFreeVideo intentHow-to, visual
Competitor sitemapFreeValidated topicsCatch-up + gap analysis
Ahrefs / Semrush$$$Volume + KDBulk discovery
GKPFree w/ AdsPaid intentCommercial keywords
AnswerThePublicFree / $$Question patternsTopic depth
Surfer Keyword SurferFreeIn-SERP intelCompetitive research
Bing / Brave suggestFreeAISO-specificClaude / ChatGPT visibility
AI ideationFree / $Templates, not listsInitial brainstorming

The 2026-specific addition: AI surface query mining. Tools like Otterly.ai, Profound, and AthenaHQ track which queries trigger your brand’s mention in ChatGPT, Perplexity, and AI Overviews. They are early but they are the only way to know what your AI-search demand surface looks like.

Visualizing it

flowchart TD
  Seed["Seed keyword (vertical, persona)"] --> M1["GSC mining"]
  Seed --> M2["Autosuggest scrape"]
  Seed --> M3["PAA / AlsoAsked"]
  Seed --> M4["Reddit / Quora"]
  Seed --> M5["YouTube suggest"]
  Seed --> M6["Competitor sitemaps"]
  Seed --> M7["Ahrefs / Semrush"]
  Seed --> M8["AI ideation (templated)"]
  M1 --> Pool["Combined candidate pool (10K+ rows)"]
  M2 --> Pool
  M3 --> Pool
  M4 --> Pool
  M5 --> Pool
  M6 --> Pool
  M7 --> Pool
  M8 --> Pool
  Pool --> Dedupe["Dedupe + normalize"]
  Dedupe --> Score["Sweet Spot scoring (Module 9)"]
  Score --> Short["Shortlist"]

Bad vs. expert

The bad approach

Process:
1. Open Ahrefs.
2. Type "saas".
3. Sort by volume.
4. Export. Done.

Result:
- 80% of pulled keywords are head terms with KD 75+.
- 0% reflect the actual phrasing the buyer uses.
- 0% reflect emerging queries Ahrefs has not yet picked up.
- The list is identical to what every competitor's intern produced.

This fails because it relies on one source that all competitors also use. The output is a commodity keyword list with no differentiation. Worse, head-term bias means the list is unwinnable for any site below DR 70.

The expert approach

# multi_source_keyword_research.py
import pandas as pd
from pathlib import Path

# Method 1: GSC export from Search Console > Performance > Queries (last 12 months)
gsc = pd.read_csv("gsc_queries_12mo.csv")
gsc = gsc[(gsc["Position"] >= 4) & (gsc["Position"] <= 20)]
gsc["source"] = "gsc_mining"

# Method 2: Ahrefs autosuggest export from Keyword Generator
auto = pd.read_csv("ahrefs_autosuggest.csv")
auto["source"] = "autosuggest"

# Method 3: AlsoAsked PAA tree export
paa = pd.read_csv("alsoasked_export.csv")
paa["source"] = "paa"

# Method 4: Reddit thread mining (manually curated questions)
reddit = pd.read_csv("reddit_questions.csv")
reddit["source"] = "reddit"

# Method 5: Competitor sitemap top pages
comp = pd.read_csv("ahrefs_competitor_top_pages.csv")
comp["source"] = "competitor"

# Method 6: Ahrefs broad keyword pull
broad = pd.read_csv("ahrefs_broad.csv")
broad["source"] = "ahrefs"

# Combine, dedupe by normalized keyword
all_sources = pd.concat([gsc, auto, paa, reddit, comp, broad], ignore_index=True)
all_sources["keyword_norm"] = all_sources["keyword"].str.lower().str.strip()

# Triangulation score: how many sources surfaced this keyword?
triangulation = all_sources.groupby("keyword_norm")["source"].nunique().reset_index()
triangulation.columns = ["keyword_norm", "source_count"]

merged = all_sources.merge(triangulation, on="keyword_norm")
merged = merged.drop_duplicates(subset=["keyword_norm"])

# Keywords surfaced by 3+ sources are highest-evidence demand
high_evidence = merged[merged["source_count"] >= 3]
high_evidence.to_csv("triangulated_shortlist.csv", index=False)

This works because triangulation across 6+ sources separates real demand from estimation noise. A keyword that GSC, Reddit, and Ahrefs all surface is overwhelmingly real. A keyword that only Ahrefs shows might be an estimation artifact. The output is a defensible shortlist that is impossible to replicate from any single tool.

Do this today

  1. Export the last 12 months of queries from Google Search Console > Performance > Queries. Filter rows where Position is between 4 and 20. Save as gsc_4_20.csv.
  2. Open Ahrefs Keyword Generator (free at ahrefs.com/keyword-generator) or Semrush Keyword Magic Tool. Enter 5 seed keywords. Export top 500 each. Save as tool_seeds.csv.
  3. Run AlsoAsked at alsoasked.com on your top 5 head terms. Export the question tree. Save as paa_tree.csv.
  4. Identify your top 3 competitors. Visit each at competitor.com/sitemap.xml. Or in Ahrefs > Site Explorer > Top Pages, sort by traffic. Save the URLs and target keywords as competitor_top.csv.
  5. Search Reddit for site:reddit.com [your-vertical]. Sort posts by “Top: All Time”. Note questions repeated 5+ times — these are validated phrasing. Save as reddit_questions.csv.
  6. Use ChatGPT or Claude to generate 30 query templates: “Give me 30 query patterns a [your-ICP] would type when researching [your-product-category]. Format: ‘[modifier] [your-product-category] [for | vs | with] [variable]’.” Validate each template in Ahrefs.
  7. Combine all CSVs into one master sheet. Add a source_count column showing how many of the 6 sources each keyword appeared in. Sort by source_count descending. Keywords with count ≥ 3 are your high-confidence shortlist.

Mark complete

Toggle to remember this module as mastered. Saved to your browser only.

More in this part

Part 2: Search Intent & Keyword Mastery

View all on the home page →
  1. 008 Search Intent Deep Dive 9m
  2. 009 Keyword Research Foundations 11m
  3. 010 Keyword Research Methods You're here 12m
  4. 011 Keyword Validation & Clustering 14m
  5. 012 Keyword Prioritization Framework 13m