From Keywords to Clusters: A Practical Guide to Topic Modelling

Keyword research has not fundamentally changed in 15 years: find terms, estimate volume, check difficulty, produce content. The problem is that this approach treats every keyword as an independent unit of work. It ignores the way search engines—and buyers—actually process information.

Topic modelling shifts the unit of analysis from the keyword to the cluster: a set of semantically related queries that share underlying intent. When you map your content against clusters rather than individual terms, you start to see your content estate the way a language model does—and your optimisation decisions get considerably sharper.

Step 1: Seed Keyword Extraction

Start with 10–20 seed terms that describe your product or service category. Run them through a keyword research tool and export the full “related keywords” dataset—not just the top 50 by volume. You want breadth at this stage, not precision.

Filter to terms with at least some measurable search volume (we use a floor of 50 monthly searches in the target market). Anything below that is addressed through content depth, not dedicated pages.

Step 2: SERP-Based Clustering

The most reliable clustering signal is shared SERP overlap. If two queries return more than 40% of the same URLs in the top 10 results, they belong to the same cluster. A single piece of content can—and should—target both.

You can approximate this manually using a keyword tool’s “SERP similarity” feature, or automate it with a Python script that calls a search API, extracts top-10 URLs per query, and computes Jaccard similarity. We have used both approaches; the automated version pays for itself after the first 200-keyword dataset.

Step 3: Intent Mapping

Once clusters are defined, label each one by dominant intent: informational, navigational, commercial, or transactional. This mapping determines content format and call-to-action strategy—not just page structure.

  • Informational clusters → long-form educational content, schema-eligible for featured snippets, internal links to commercial clusters
  • Commercial clusters → comparison pages, use-case landing pages, case studies
  • Transactional clusters → product/service pages with conversion-optimised copy and structured data

Step 4: Gap Analysis and Prioritisation

Map your existing published URLs against the cluster list. Every cluster without a target page is a gap. Every cluster where your ranking URL targets a different cluster’s intent is a cannibalisation risk.

Prioritise gaps by the product of cluster volume, your current domain authority versus the average ranking URL, and the commercial value of the intent. This gives you a ranked content backlog that connects SEO investment to business outcomes—which is what every strategy engagement should deliver.

Practical Starting Point

If this sounds like a significant investment of time, it is—the first time. Once the cluster model exists, maintaining it is a quarterly refresh. And the compounding effect of producing content that deliberately reinforces topical authority means each new post lifts the performance of everything already published.

That compounding dynamic is the core reason we run this lab: to test which cluster configurations, content formats, and internal-linking patterns deliver the fastest authority accumulation for B2B sites with constrained publishing budgets.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *