Keyword Research Methods
GSC mining, autosuggest, PAA, Reddit, Quora, YouTube, competitor sitemaps, AI ideation. Ahrefs, Semrush, Moz, Ubersuggest, Keyword Planner, AnswerThePublic, AlsoAsked.
The best keyword research uses a dozen sources at once, because no single tool sees the full demand surface. Ahrefs misses the new queries that have not crossed its volume threshold. Google Search Console sees only what you already rank for. Reddit and Quora show queries phrased the way humans actually type. YouTube and TikTok show the queries Gen Z searches outside Google. The expert workflow is parallel — pull from every source, dedupe, and let the volume of evidence reveal the real opportunities.
TL;DR
- No single source sees the full demand surface. Ahrefs and Semrush index queries with detectable volume; GSC sees only your impressions; Reddit/Quora see phrasing; YouTube/TikTok show non-Google demand. Use 6+ sources in parallel.
- GSC mining is the single highest-fit signal you have. Queries you already get impressions for are pre-validated against your domain authority. Mine them before you touch any third-party tool.
- AI-assisted ideation works in 2026 — bounded. Use ChatGPT, Claude, or Perplexity to generate query patterns, not query lists. Let the AI produce templates (“[X] in [Y]”, “best [X] for [persona]”), then validate volume and intent in a real keyword tool.
The mental model
Keyword research methods are like a fishing operation, not an aquarium visit. The aquarium (Ahrefs, Semrush) shows you the fish that have already been catalogued. The open ocean (Reddit, autosuggest, GSC, YouTube) is where the new fish are — including the ones with high commercial value that no competitor has noticed yet.
Each method has a different yield. Autosuggest harvests emerging queries (Google completes them because they are trending). PAA harvests adjacent intent (Google literally tells you what people ask next). Reddit harvests language patterns (the exact phrasing your buyer types). Competitor sitemaps harvest proven content opportunities (someone validated the demand and you can do it better).
The point of running multiple methods is not redundancy. It is triangulation. A keyword that appears in your GSC, in Ahrefs at decent volume, in autosuggest, and in a Reddit thread has overwhelming evidence of real demand. A keyword that appears only in Ahrefs at 8K volume but is missing from every other source is probably an estimation error.
Deep dive: the 2026 reality
The 12 methods that matter, in priority order:
1. Google Search Console mining. GSC > Performance > Queries shows every query that produced an impression. Filter to positions 4-20 — these are the queries you almost rank for and can win with on-page work. This is the single most actionable list you will ever build.
2. Autosuggest scraping. Type a seed phrase + space into Google. The 10 suggestions are queries Google has high confidence in. Keyword Tool (keywordtool.io), Ahrefs Keyword Generator, and Soovle scrape autosuggest at scale. Free seed → 100+ candidates in seconds.
3. People Also Ask (PAA) extraction. Each query’s PAA box has 4-8 questions; clicking one expands more. Scrape with AlsoAsked (alsoasked.com) to get the tree of related questions. Particularly useful for AEO content design.
4. Reddit and Quora deep mining. Reddit’s /r/[your-vertical] shows the exact phrasing your buyers use. Top-voted posts that are questions are validated demand. GummySearch, Subreddit Stats, or manual sorting by “Top: All Time” filtered to “Question” posts.
5. YouTube suggest. Type seed in YouTube search. Suggestions are video-intent queries. Critical for visual or how-to content. Use TubeBuddy or vidIQ for volume estimates.
6. Competitor sitemaps. Visit https://competitor.com/sitemap.xml. Their URLs are their published keyword strategy. Ahrefs Site Explorer > Top Pages shows which competitor URLs drive traffic and what queries earn it.
7. Ahrefs / Semrush bulk pulls. The third-party tools fill the gap of “keywords I do not yet rank for and have not thought of.” Ahrefs Keywords Explorer, Semrush Keyword Magic Tool, Moz Keyword Explorer, and Ubersuggest (cheapest tier) all do similar work.
8. Google Keyword Planner. Free, requires a Google Ads account. Volume buckets are coarse but the source-of-truth integration with Ads makes it useful for paid-side calibration.
9. AnswerThePublic. Visualizes question variants of a seed keyword. Free with rate limits, paid for unlimited. Quick way to find micro-intents.
10. Surfer Keyword Surfer (browser extension). Shows volume estimates in the SERP itself. Useful during competitive research browsing.
11. Bing / DuckDuckGo / Brave autosuggest. Different index, different suggestions. Brave is the index Claude with web uses, so unique Brave suggestions matter for AISO.
12. AI-assisted ideation. ChatGPT, Claude, Perplexity. Use them to generate query templates and persona-modifier combinations, then validate in real tools.
| Method | Cost | Yield | Best for |
|---|---|---|---|
| GSC mining | Free | High-fit, low-volume | Positions 4-20 quick wins |
| Autosuggest | Free / cheap | Broad seeds | Initial expansion |
| PAA / AlsoAsked | Free / $$ | Question variants | AEO content |
| Reddit / Quora | Free | Buyer language | Pain-point content |
| YouTube suggest | Free | Video intent | How-to, visual |
| Competitor sitemap | Free | Validated topics | Catch-up + gap analysis |
| Ahrefs / Semrush | $$$ | Volume + KD | Bulk discovery |
| GKP | Free w/ Ads | Paid intent | Commercial keywords |
| AnswerThePublic | Free / $$ | Question patterns | Topic depth |
| Surfer Keyword Surfer | Free | In-SERP intel | Competitive research |
| Bing / Brave suggest | Free | AISO-specific | Claude / ChatGPT visibility |
| AI ideation | Free / $ | Templates, not lists | Initial brainstorming |
The 2026-specific addition: AI surface query mining. Tools like Otterly.ai, Profound, and AthenaHQ track which queries trigger your brand’s mention in ChatGPT, Perplexity, and AI Overviews. They are early but they are the only way to know what your AI-search demand surface looks like.
Visualizing it
flowchart TD
Seed["Seed keyword (vertical, persona)"] --> M1["GSC mining"]
Seed --> M2["Autosuggest scrape"]
Seed --> M3["PAA / AlsoAsked"]
Seed --> M4["Reddit / Quora"]
Seed --> M5["YouTube suggest"]
Seed --> M6["Competitor sitemaps"]
Seed --> M7["Ahrefs / Semrush"]
Seed --> M8["AI ideation (templated)"]
M1 --> Pool["Combined candidate pool (10K+ rows)"]
M2 --> Pool
M3 --> Pool
M4 --> Pool
M5 --> Pool
M6 --> Pool
M7 --> Pool
M8 --> Pool
Pool --> Dedupe["Dedupe + normalize"]
Dedupe --> Score["Sweet Spot scoring (Module 9)"]
Score --> Short["Shortlist"]
Bad vs. expert
The bad approach
Process:
1. Open Ahrefs.
2. Type "saas".
3. Sort by volume.
4. Export. Done.
Result:
- 80% of pulled keywords are head terms with KD 75+.
- 0% reflect the actual phrasing the buyer uses.
- 0% reflect emerging queries Ahrefs has not yet picked up.
- The list is identical to what every competitor's intern produced.
This fails because it relies on one source that all competitors also use. The output is a commodity keyword list with no differentiation. Worse, head-term bias means the list is unwinnable for any site below DR 70.
The expert approach
# multi_source_keyword_research.py
import pandas as pd
from pathlib import Path
# Method 1: GSC export from Search Console > Performance > Queries (last 12 months)
gsc = pd.read_csv("gsc_queries_12mo.csv")
gsc = gsc[(gsc["Position"] >= 4) & (gsc["Position"] <= 20)]
gsc["source"] = "gsc_mining"
# Method 2: Ahrefs autosuggest export from Keyword Generator
auto = pd.read_csv("ahrefs_autosuggest.csv")
auto["source"] = "autosuggest"
# Method 3: AlsoAsked PAA tree export
paa = pd.read_csv("alsoasked_export.csv")
paa["source"] = "paa"
# Method 4: Reddit thread mining (manually curated questions)
reddit = pd.read_csv("reddit_questions.csv")
reddit["source"] = "reddit"
# Method 5: Competitor sitemap top pages
comp = pd.read_csv("ahrefs_competitor_top_pages.csv")
comp["source"] = "competitor"
# Method 6: Ahrefs broad keyword pull
broad = pd.read_csv("ahrefs_broad.csv")
broad["source"] = "ahrefs"
# Combine, dedupe by normalized keyword
all_sources = pd.concat([gsc, auto, paa, reddit, comp, broad], ignore_index=True)
all_sources["keyword_norm"] = all_sources["keyword"].str.lower().str.strip()
# Triangulation score: how many sources surfaced this keyword?
triangulation = all_sources.groupby("keyword_norm")["source"].nunique().reset_index()
triangulation.columns = ["keyword_norm", "source_count"]
merged = all_sources.merge(triangulation, on="keyword_norm")
merged = merged.drop_duplicates(subset=["keyword_norm"])
# Keywords surfaced by 3+ sources are highest-evidence demand
high_evidence = merged[merged["source_count"] >= 3]
high_evidence.to_csv("triangulated_shortlist.csv", index=False)
This works because triangulation across 6+ sources separates real demand from estimation noise. A keyword that GSC, Reddit, and Ahrefs all surface is overwhelmingly real. A keyword that only Ahrefs shows might be an estimation artifact. The output is a defensible shortlist that is impossible to replicate from any single tool.
Do this today
- Export the last 12 months of queries from Google Search Console > Performance > Queries. Filter rows where Position is between 4 and 20. Save as
gsc_4_20.csv. - Open Ahrefs Keyword Generator (free at
ahrefs.com/keyword-generator) or Semrush Keyword Magic Tool. Enter 5 seed keywords. Export top 500 each. Save astool_seeds.csv. - Run AlsoAsked at
alsoasked.comon your top 5 head terms. Export the question tree. Save aspaa_tree.csv. - Identify your top 3 competitors. Visit each at
competitor.com/sitemap.xml. Or in Ahrefs > Site Explorer > Top Pages, sort by traffic. Save the URLs and target keywords ascompetitor_top.csv. - Search Reddit for
site:reddit.com [your-vertical]. Sort posts by “Top: All Time”. Note questions repeated 5+ times — these are validated phrasing. Save asreddit_questions.csv. - Use ChatGPT or Claude to generate 30 query templates: “Give me 30 query patterns a [your-ICP] would type when researching [your-product-category]. Format: ‘[modifier] [your-product-category] [for | vs | with] [variable]’.” Validate each template in Ahrefs.
- Combine all CSVs into one master sheet. Add a source_count column showing how many of the 6 sources each keyword appeared in. Sort by source_count descending. Keywords with count ≥ 3 are your high-confidence shortlist.
Mark complete
Toggle to remember this module as mastered. Saved to your browser only.
More in this part