Most AI visibility advice starts at the wrong end. It tells you to write better content, structure your answers more clearly, use natural language — as if the problem is how you communicate. It isn't. The problem is whether AI can classify your organisation at all. Most complex B2B sites fail that test before a single word of output is generated. This article explains the five-stage mechanism that determines why.
Both happen. The same crawl that feeds traditional search indexing also feeds AI interpretation — and understanding that distinction is what makes the five stages below useful.
Indexing is the first operation: the crawler fetches your pages, extracts their content, and stores a retrievable record. This is well-understood, and most B2B sites pass it without issue. Interpretation is the second operation, running on the same indexed content (AI Search Architecture Deep Dive): AI systems classify your entities, score confidence in those classifications, and build the probability model that determines whether you get cited. Passing indexing but failing interpretation is the most common B2B pattern — and it's why generation-stage tests ("does AI describe us correctly?") are unreliable diagnostics. They reveal the symptom without locating the stage that caused it.
The mechanism has five stages. Each stage either builds or breaks the model that AI uses to answer questions about your organisation.
| Stage | What it determines |
|---|---|
| 1 — Extraction | Whether the page produces usable plain text after HTML is stripped; if not, nothing downstream has content to work with |
| 2 — Recognition | Whether AI can identify specific named entities on your site; generic positioning language produces no classifiable entity |
| 3 — Mapping | Whether entity signals are consistent across your top pages; contradictions between pages collapse the model |
| 4 — Weighting | Whether reinforcing signals are dense enough to build citation confidence; one page is noise |
| 5 — Generation | Whether the final output reflects an accurate entity model; symptoms visible here, causes upstream |
Most B2B sites fail at stages two and three. The sections below walk each stage in turn.
Stage 1 — Extraction: stripping HTML to plain text
Extraction is the first stage of AI interpretation — the crawler strips all HTML from your page to produce plain text. What survives the strip is all any downstream stage has to work with.
The failure pattern: The page's meaning lives in its design.
Logo alt text says "logo". Navigation labels say "Solutions" and "Services". The hero section carries the company's core positioning claim as text embedded in an image — no HTML text equivalent, no alt attribute that describes it. After the crawler strips HTML to plain text, what remains is generic and uninformative. The design communicates clearly to a human eye. The extracted residue communicates almost nothing.
Why it breaks: AI interpretation runs on plain text, not rendered layout. CSS-hidden text doesn't survive the strip. Text embedded in images doesn't survive unless the alt attribute carries the same meaning. JavaScript-rendered content that isn't in the DOM when the crawler visits doesn't survive either. What a human reads on a rendered page and what the extraction stage produces can be radically different documents.
What good looks like: The plain text of the page — read as a stranger, stripped of every visual element — clearly states what the company does, for whom, and in what sector. Hero text is real HTML text. Navigation labels name capabilities, not categories. The extracted residue is a usable document.
Checkpoint: Strip every visual element from your homepage and read only the raw text. No images, no layout, no CSS. Can a stranger name your company's specific capability and sector within ten seconds?
Most B2B sites pass Stage 1. The serious failure rates are at the next two stages.
Stage 2 — Recognition: identifying entities
Entity recognition is the stage where AI pattern-matches your plain text against known named entities — sector terms, capability terms, specific product names. This is the stage where most complex B2B sites fail.
The failure pattern: The homepage passes extraction with a reasonable block of text — but that text is positioning language. "We help businesses grow." "Innovative solutions for complex challenges." "Your partner in transformation." No named entities. No specific sector. No named capabilities. The AI classifies the company as "general digital agency" or "general manufacturer" because that is all the evidence supports.
Why it breaks: Entity recognition at this stage is not intelligent inference — it is pattern-matching against what is named and specific. If your homepage doesn't name specific capabilities, specific sectors, or specific outcomes, there is no entity anchor to classify against. "Complex B2B challenges" is not a named entity. "PEEK polymer components for aerospace applications" is.
The contrast is operational, not stylistic. These are two descriptions of the same company:
Generic: "We deliver transformation programmes that help industrial businesses improve operational efficiency and drive growth."
Entity-rich: "We design and implement operational improvement programmes for polymer and composite component manufacturers — reducing cycle time, reducing rework rates, and enabling capacity scaling."
The first produces no classifiable entity at Stage 2. The second produces four: the sector, the transformation type, the capability, and the outcome terms.
What good looks like: The plain text of the homepage and top-level landing pages contains specific named entities — sector terms, capability terms, named products or methodologies — that unambiguously place the company in a recognisable classification.
Checkpoint: Read your homepage as a complete stranger. Can you name three specific things this company makes or does — not categories, not promises, but named capabilities or sector terms? If not, Stage 2 is failing.
Schema.org markup: the explicit disambiguation channel
Schema.org markup is the deterministic shortcut for Stage 2. Plain-text entity recognition requires AI to infer your classification from prose — a probabilistic process. Schema.org markup makes it explicit: @type: Organization, hasOfferCatalog, and service schema declarations let AI identify your entities directly without inferring them from context. A homepage that uses generic positioning language in visible text but declares its specific entities in structured markup closes most of the Stage 2 recognition gap without rewriting a word of copy.
Checkpoint: Does your homepage HTML include structured data that explicitly names your primary capability and sector? Paste your source into Google's Rich Results Test and check for an @type: Organization declaration with a specific description.
Graph's AI Visibility Report 2026 found that 60%+ of B2B sites miss having correct entity associations at this stage. Not at Generation (stage 5) but here.
Passing Stage 2 is not enough. Even a homepage rich in named entities can fail at the next stage — if the rest of the site tells a different story.
Stage 3 — Mapping: connecting entities across pages
Entity mapping is the stage where AI builds a consistent model by aggregating entity signals across your top pages. This is where accumulated history becomes a structural liability.
The failure pattern: The homepage correctly names the company as a specialist polymer manufacturer. The services page, written three years ago, leads with "digital transformation consulting for manufacturers". A case study from 2021 describes the company as a "process optimisation firm". The AI sees three competing classifications and has no grounds to prefer one — it hedges, omits, or produces a blended description that satisfies nobody.
Why it breaks: AI Mapping doesn't read one page — it aggregates entity signals across the top pages it has crawled and tries to build a consistent model. When those pages carry contradictory positioning — product-era language alongside brand-refresh language alongside a service-era case study — the model collapses to the broadest defensible generalisation. The company changed. The content archive didn't.
What good looks like: The top ten pages of the site describe the same primary business — the same sector, the same capability framing, the same named entities. Older pages that contradict the current positioning have been updated or removed. An AI reading across those pages builds a single, consistent, high-confidence entity model.
Checkpoint: Pull your top ten most-crawled pages. Do they describe the same primary business? If a legacy services page or an old case study describes a different company than your current homepage, Stage 3 is failing.
Graph's AI Visibility Report 2026 found ~30% of complex industrial B2B sites fail at Mapping. Combined with the 60%+ that fail at Recognition, these two stages account for most of the AI visibility failures we see.
Synthesising question: if you described your company using only the plain text visible across your top ten pages — would every page produce the same description?
Stage 4 — Weighting: scoring confidence
Entity weighting is a signal-density calculation — AI scores how confidently it can cite a capability based on how many pages consistently reinforce the same entity claim. Correct classification is necessary but insufficient; this stage determines whether the AI is confident enough to act on it.
The failure pattern: The company is a specialist in advanced composite materials. The homepage names it. One services page confirms it. There are no supporting pages — no depth content on composites, no case studies referencing composite applications, no sector-specific guides. One page saying something is weak signal. The AI scores confidence low and hedges ("a company that may offer composite materials solutions") or omits entirely when a more signal-dense competitor appears.
Why it breaks: Weighting is a signal-density calculation — it counts the pages that reinforce a specific entity claim with consistent, specific language. A single page is noise. Three pages with consistent entity language and specific proof starts to register. Five or more pages with consistent reinforcement builds the confidence score that makes a company citable without qualification.
What good looks like: Pick the company's strongest capability. There are at least three distinct pages — homepage, service page, at least one depth page — that reinforce that capability with consistent, specific named language. A reader landing on any of those pages can identify the same capability without reading the others.
Checkpoint: Pick your single strongest capability. Count the distinct pages on your site that explicitly reinforce it with specific, named language — not implied or adjacent. If fewer than three, Weighting is failing for that capability.
The fix is structural. Editorial polish at Generation never closes a Recognition gap.
Treating "AI doesn't describe us well" as a Generation problem is the diagnostic error that keeps most B2B sites stuck. The symptom is visible at Stage 5. The cause is almost always here.
Stage 5 — Generation: composing the answer
Generation is where AI writes the output the user sees — the citation, the summary, the direct answer. It draws on the entity model built across Stages 1 to 4. If that model is weak, vague, or absent, Generation produces weak, vague, or absent output.
"AI doesn't describe us well." "We get a vague answer when someone asks what we do." "We get cited less than our competitors, even though our site is better." These are Generation-stage observations. They are not Generation-stage problems.
The cause is upstream: a Recognition failure at Stage 2, a Mapping contradiction at Stage 3, a Weighting gap at Stage 4. Generation-stage fixes — writing more AI-friendly content, answering questions clearly, using natural language in your copy — address the symptom. They cannot close a Recognition gap. They do not increase Weighting signal density. They cannot resolve contradictions across your top pages. The mechanism doesn't run backwards.
Generation is downstream. Most B2B sites lose before they get there.
The implication: if your AI visibility diagnosis lands at Generation, the diagnosis is incomplete. The cause is earlier in the chain. The next section maps three specific tests — one for each of the stages where B2B sites actually fail.
What this means for your site
You now have the mechanism. The question is where your site stands against it.
The three stages where B2B sites lose — Recognition (Stage 2), Mapping (Stage 3), Weighting (Stage 4) — each have a concrete diagnostic question. None requires specialist tooling. Each produces a named outcome.
Stage 2 — Recognition: Read your homepage as a complete stranger. Can you name three specific things this company makes or does — not categories, not promises, but named capabilities or sector terms? If not, Stage 2 is failing.
Stage 3 — Mapping: Pull your top ten most-crawled pages. Do they describe the same primary business? If a legacy services page or an old case study describes a different company than your current homepage, Stage 3 is failing.
Stage 4 — Weighting: Pick your single strongest capability. Count the distinct pages on your site that explicitly reinforce it with specific, named language — not implied or adjacent. If fewer than three, Weighting is failing for that capability.
Graph's AI Visibility Report 2026 found that the entity-recognition failure pattern (Stage 2) accounts for 60%+ of AI visibility failures — and cross-page contradiction (Stage 3) accounts for a further ~30%. Both patterns are detected at scale by Katelyn, Graph's AI agentic marketing platform.
If you'd like to know which stages are failing on your site — and what to fix first — our free AI Visibility Audit maps exactly this.
Key takeaways
- AI doesn't read your website as a human does — it runs a five-stage mechanism that converts pages to entity classifications before any answer is generated.
- Most B2B sites fail at Stage 2 (Recognition) or Stage 3 (Mapping), not at Stage 5 (Generation) where the problem first becomes visible.
- Positioning language without named entities produces no classifiable entity at Stage 2; the AI cannot cite what it cannot classify.
- Cross-page contradictions collapse the entity model at Stage 3 — accumulated content history is the structural cause, not poor writing.
- Signal density across multiple pages determines citation confidence at Stage 4; a single page claiming a capability is weak signal.
- Generation-stage optimisation addresses symptoms. The cause is upstream.
Frequently asked questions
Why does AI give vague descriptions of what my company does?
The most common cause is a Stage 2 failure: your homepage passes extraction — the text is readable as plain text — but it contains positioning language rather than named entities. Phrases like "innovative solutions" or "transformation partner" produce no classifiable entity. AI pattern-matches against what is named and specific. Without named sector terms, capability terms, or specific outcome language, it defaults to a broad generalisation. The fix is to make your entity claims explicit, not more eloquent.
How many pages does my website need for AI to cite it confidently?
There is no absolute threshold, but the Stage 4 (Weighting) signal-density pattern is consistent: a single page claiming a capability is weak signal that AI hedges or ignores. Three distinct pages — homepage, a service page, and at least one depth page — that each reinforce the same specific capability with consistent named language starts to register. Five or more pages with coherent reinforcement builds the confidence score that makes a company citable without qualification.
What causes AI to contradict itself when describing the same company?
Cross-page contradictions at Stage 3 (Mapping). AI aggregates entity signals across the top pages it has crawled and attempts to build a consistent model. If your homepage describes a specialist manufacturer, your services page describes a consulting practice, and a three-year-old case study describes a process optimisation firm — AI sees three competing classifications. It hedges, omits, or blends them into a description that matches none of your actual positioning. Content history is the structural cause; the company evolved but the archive did not.
Is AI visibility different from SEO, and do the same pages matter?
The mechanism is different even when it uses the same pages. Traditional SEO indexes pages and ranks them by relevance signals. AI interpretation uses the same indexed pages but builds an entity model — a probability-weighted classification of what your organisation is and what it does. A page that ranks well for a keyword may still fail at entity recognition if it uses positioning language rather than named entity terms. The ranking signal and the citation confidence signal are not the same signal.
Related guides
- AI visibility tools 2026 — first-hand testing, independent review
- Measuring AI visibility success — Metrics that reflect AI citation, not just traffic
- AEO vs SEO — understanding the distinction
- PDF invisibility — Why technical documents are excluded from AI retrieval
- LLM parsability — how AI interprets content structure
- Semantic Density — topical authority for AI systems
Related reading
- How to get cited by Google AI Overviews
- Why isn't our brand showing up in ChatGPT even though we rank on Google?
- Search visibility audit
- AI Visibility Audit
Graph Digital's free AI Visibility Audit maps which of the five stages are failing on your site and tells you what to fix first. The audit runs against your live site — no specialist tool access and no data export required from you. You come away with a structured per-stage finding ready to work from.
