SEO Lab Sandbox

Blog

From Keywords to Clusters: A Practical Guide to Topic Modelling
Keyword research has not fundamentally changed in 15 years: find terms, estimate volume, check difficulty, produce content. The problem is that this approach treats every keyword as an independent unit of work. It ignores the way search engines—and buyers—actually process information.

Topic modelling shifts the unit of analysis from the keyword to the cluster: a set of semantically related queries that share underlying intent. When you map your content against clusters rather than individual terms, you start to see your content estate the way a language model does—and your optimisation decisions get considerably sharper.

Step 1: Seed Keyword Extraction

Start with 10–20 seed terms that describe your product or service category. Run them through a keyword research tool and export the full “related keywords” dataset—not just the top 50 by volume. You want breadth at this stage, not precision.

Filter to terms with at least some measurable search volume (we use a floor of 50 monthly searches in the target market). Anything below that is addressed through content depth, not dedicated pages.

Step 2: SERP-Based Clustering

The most reliable clustering signal is shared SERP overlap. If two queries return more than 40% of the same URLs in the top 10 results, they belong to the same cluster. A single piece of content can—and should—target both.

You can approximate this manually using a keyword tool’s “SERP similarity” feature, or automate it with a Python script that calls a search API, extracts top-10 URLs per query, and computes Jaccard similarity. We have used both approaches; the automated version pays for itself after the first 200-keyword dataset.

Step 3: Intent Mapping

Once clusters are defined, label each one by dominant intent: informational, navigational, commercial, or transactional. This mapping determines content format and call-to-action strategy—not just page structure.
- Informational clusters → long-form educational content, schema-eligible for featured snippets, internal links to commercial clusters
- Commercial clusters → comparison pages, use-case landing pages, case studies
- Transactional clusters → product/service pages with conversion-optimised copy and structured data
Step 4: Gap Analysis and Prioritisation

Map your existing published URLs against the cluster list. Every cluster without a target page is a gap. Every cluster where your ranking URL targets a different cluster’s intent is a cannibalisation risk.

Prioritise gaps by the product of cluster volume, your current domain authority versus the average ranking URL, and the commercial value of the intent. This gives you a ranked content backlog that connects SEO investment to business outcomes—which is what every strategy engagement should deliver.

Practical Starting Point

If this sounds like a significant investment of time, it is—the first time. Once the cluster model exists, maintaining it is a quarterly refresh. And the compounding effect of producing content that deliberately reinforces topical authority means each new post lifts the performance of everything already published.

That compounding dynamic is the core reason we run this lab: to test which cluster configurations, content formats, and internal-linking patterns deliver the fastest authority accumulation for B2B sites with constrained publishing budgets.
May 16, 2026
The B2B SEO Audit Checklist: What Most Teams Miss
Most SEO audits are structured around tools, not outcomes. You run a site crawler, export a spreadsheet of technical warnings, and hand it to an engineering team that has twelve higher priorities. Six months later, a third of the issues are closed and organic traffic is flat. Sound familiar?

The problem is sequencing. Not all SEO issues are equal, and the ones that generate the longest audit reports are rarely the ones that move rankings. Here is the checklist we use at SEO Lab—ordered by the ratio of implementation effort to ranking impact.

Tier 1: Fix These Before Anything Else
- Crawl budget leakage. Faceted navigation, infinite scroll, and session-ID parameters waste crawl budget on duplicate content. Block them in robots.txt or via canonical tags before investing in new content.
- Indexation mismatch. Compare your sitemap count against Google Search Console’s indexed page count. A delta of more than 15 % usually indicates canonicalisation or noindex issues worth investigating.
- Core Web Vitals on top-traffic templates. LCP and CLS regressions on high-traffic page types have measurable CTR impact. Fix templates, not individual pages.
- Broken internal links. Every 404 on an internal link wastes link equity and degrades user experience. A weekly automated check costs nothing.
Tier 2: High Leverage, Medium Effort
- Title-tag and meta-description alignment with SERP intent. Rewrite title tags on pages ranking in positions 5–15 before creating new content. The lift is faster and cheaper.
- Structured data coverage. Article, FAQ, and HowTo schema on relevant templates improves rich-result eligibility without requiring content changes.
- Internal-link architecture audit. Pages with high authority but few internal links pointing to them are the easiest ranking wins most teams overlook. We covered the mechanics in our post on AI content strategy.
- Log-file analysis. Googlebot’s crawl patterns reveal which pages it considers important. Mismatches between crawl frequency and your content priorities are actionable signals.
Tier 3: Important But Not Urgent
- Image optimisation (next-gen formats, lazy loading)
- Hreflang implementation for international sites
- Breadcrumb schema on deep content hierarchies
- Page-speed improvements beyond Core Web Vitals thresholds
The One Thing Most Teams Miss

Content decay. Posts that ranked well 18–24 months ago and have since slipped from page one are your lowest-cost ranking opportunity. A targeted refresh—updated statistics, expanded sections, improved internal linking—typically out-performs a new post targeting the same term. We run quarterly content-decay audits as part of our SEO strategy service; the ROI is consistently the highest in any engagement.

Bookmark this checklist, but treat it as a starting point. Every site has a unique technical debt profile. The goal is to build a prioritisation model that reflects your team’s capacity and your site’s specific bottlenecks—not to work through a generic list.
May 16, 2026
How AI Workflow Automation Is Changing B2B Content Strategy
For most of the past decade, B2B content strategy meant keyword research, a brief, a writer, and a six-week production cycle. AI-assisted workflows are collapsing that timeline—but the teams winning in organic search right now are not simply moving faster. They are changing what they produce.

The Shift From Volume to Architecture

The first wave of AI content adoption chased volume: more posts, more pages, more coverage. The results were predictable. Search engines rewarded depth and authority, not density. The teams that outperformed were the ones that used AI to build topic clusters—interlinked content architectures where every asset reinforces every other.

A well-designed cluster answers the full range of questions a buyer asks across their research journey. The pillar page targets a broad, high-intent term. Supporting posts target longer-tail variants. Internal links signal topical authority to crawlers and guide readers toward conversion. None of that changes because you used an LLM to write the first draft.

Where AI Actually Helps
- Semantic gap analysis. LLMs can compare your existing content against a target keyword and surface missing subtopics faster than any manual audit.
- Outline generation. A structured H2/H3 skeleton based on SERP analysis takes minutes rather than hours.
- First-draft acceleration. Writers edit at two to three times the speed they produce from scratch. Quality control stays human; drudgery does not.
- Metadata and internal-link suggestions. Consistent, on-brand meta descriptions and contextually appropriate anchor text at scale.
The Non-Negotiable Human Layer

AI-generated content without editorial review is auditable. Search quality raters, increasingly trained on AI-output patterns, flag it. More importantly, B2B buyers are sophisticated. A case study that sounds plausible but lacks specificity destroys trust faster than a blank page.

The winning workflow pairs LLM speed with subject-matter expertise: a domain expert reviews the outline, adds proprietary data or opinion, and the editor cuts anything that reads as generic. That loop takes roughly half the time of a traditional production cycle and produces content that earns links.

We document every variation of this workflow in our services practice and run ongoing comparisons between fully human, AI-assisted, and AI-first approaches. The results, so far, are more nuanced than either side of the AI-content debate would have you believe.

What to Measure

If you adopt an AI-assisted workflow, track these metrics at 60, 90, and 180 days:
1. Indexed page count vs. prior period
2. Impressions per published post (normalised by word count)
3. Time-to-first-ranking for new posts targeting sub-1,000 monthly search volume terms
4. Internal-link click-through rate on supporting cluster posts
5. Assisted conversion rate from organic entry points
The last metric is the one most teams ignore—and the one that justifies the investment to a CFO. Organic is not a traffic channel; it is a pipeline channel. Measure it that way.
May 16, 2026