Measuring AI visibility: six metrics that actually answer the board's three questions

Discuss this article with AI

Most organisations measuring AI visibility are not measuring AI visibility. The six metrics that answer the board's three real questions (do we have a problem, how big is it, are we making progress) are share of answers, third-party mention, information correctness, recommendations, multi-surface tracking, and multi-run consistency. Each maps to one question. Most dashboards track none of them with the prompt discipline serious B2B sectors require.

The scale of the exclusion is significant. Graph's AI Visibility Report for 2026 found that 82% of B2B manufacturing and industrial brands are invisible in early-stage AI buyer discovery: the moment a buyer describes a problem without naming a vendor. Most organisations are not measuring this gap because they do not have a framework for what to measure.

The three questions every board asks about AI visibility

Most AI visibility measurement fails at the same point. Not because the tools are wrong, but because the questions are. Marketing teams come back to the board with visibility scores, share-of-voice numbers, and heat-maps that fill a slide but answer nothing the CEO actually asked.

There are no out-of-the-box metrics for AI visibility. Measurement is structured probing: a disciplined prompt set, run at scale, read across six specific signals. This guide names those six metrics, shows the prompt-design discipline that makes them reliable in complex B2B, and explains why most third-party tools break down before they get there.

The three questions the board actually asks are these: Do we have a problem? How big is it? Are we making progress? That is the reader's real job in this conversation: not "improving AI visibility" as an abstraction, but returning to the CEO with a coherent answer to each of those three questions. Every metric in this guide maps to one of them.

At Graph, I have been working with AI retrieval systems since 2019, including a Microsoft project indexing 1.9PB of enterprise data before large language models were mainstream. I have advised clients including Victrex and Philips on digital strategy, technology, and now AI, across more than 25 years working in complex B2B.

What I see repeatedly is the agency-busy problem: dashboards full of AI visibility scores that fluctuate without explanation, share-of-voice numbers that do not translate to pipeline, and heat-maps that look authoritative until the board asks a direct question. At that point, the measurement system has no answer.

That is not a data problem. It is a question problem. The wrong metrics were being collected for the wrong questions.

This guide is for Marketing Directors and commercial leaders who already know AI visibility matters and have been failed by dashboard tools. 95% of B2B buyers plan to use generative AI in at least one area of a future purchase, and over half say it led them to consider more or different vendors (Forrester), and the shortlisting is happening before the first sales conversation, in AI answers you cannot see and may not be part of. The six metrics here are the ones that actually close that gap.

How we measure AI visibility

Most AI visibility dashboards give you a number. The number moves. No one knows why.

That is not a measurement problem. It is a category error. AI answers are not indexed in a retrievable database. There is no API call that returns "how visible is your brand". The moment you treat a dashboard visibility score as the measurement itself, you have stopped measuring and started watching a proxy.

The failure pattern

A team buys an AI visibility tool expecting a clean metric: a brand-mention count, a presence score, or a coverage heat map, and treats the output as the answer. The number rises and falls week to week without explanation, because the tool is sampling a generic prompt set that bears no relation to the queries your buyers actually run.

"The mistake most teams make is expecting measurement tools to tell them what to fix."

Stefan Finch, Head of AI, Graph

Why it breaks

AI is generative. Its answers are produced fresh at query time, not retrieved from stored records. The only way to observe your visibility is to run the prompts your buyers would actually use and record what each AI surface returns. Measurement here is observation of live outputs: structured, repeatable observation, not data retrieval.

Measurement tools confirm whether you have an AI visibility problem. They track whether structural fixes are working after you have made them. They cannot identify entity conflicts, semantic gaps, or the structural causes of exclusion, because those causes are invisible to tools that measure outputs, not inputs. When you know you are excluded from AI answers but do not know why, that is not a measurement problem. No amount of additional tracking will close it. You need a structural diagnostic.

What good looks like

A measurement programme built on a prompt set designed for your buyers' actual search behaviour. Multi-dimensional coverage: buyer journey stage crossed with buyer persona crossed with application area. Prompt variants to detect sensitivity to phrasing, certifications, processing methods, and supply chain context. Consistent intervals across multiple AI surfaces: ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews. The metric is what you observe in the output. The structured prompt programme is the instrument that makes observation reliable.

Measurement is most useful at two specific moments. The first is before structural work begins, when you need a baseline citation rate, AI Overview presence reading, and share of answers versus competitors before you fix anything. Without that baseline, you cannot demonstrate that structural fixes are working. The second is after structural fixes are in place, when measurement tracks whether the changes are registering. The gap between those two moments is where measurement cannot help you.

Diagnose first. Measure to validate.

Checkpoint question

Is your current AI visibility measurement built on a prompt set that reflects how your buyers actually search, or on your tool's default prompt library?

Serious measurement must answer three questions the board will eventually ask: do we have a problem, how big is it, and are we making progress? The sections that follow name how the six metrics in this guide answer each one. For a view of tools available to run this programme, see our guide to AI visibility tools.

How to think about prompt design for AI visibility measurement

The instrument determines the quality of what you observe, and in AI visibility measurement, the instrument is the prompt set.

Most measurement programmes start with the wrong inputs. The actual questions your buyers are typing into ChatGPT or Perplexity, the specific queries that determine whether your organisation makes the shortlist, are rarely the ones being tracked. That gap between the prompts being measured and the prompts that matter commercially is the design problem this section addresses.

The failure pattern

The default measurement approach uses a small library of headline prompts, typically branded or category-level queries chosen during onboarding by the measurement tool's team, not by anyone with deep knowledge of how your specific buyers actually search. A chemicals manufacturer might appear confidently in answers to Which companies offer specialty polymer compounds? but be entirely absent from Which supplier of fluoropolymer coatings meets EN ISO 13736 for aerospace applications in Germany? That second prompt is the one a procurement officer is actually running. The first one fills a dashboard.

Why it breaks

Generative AI produces materially different answers for prompts that vary in phrasing, specificity, sector context, certification requirement, or value-chain position. Change the question from "best supplier of X" to "best supplier of X for Y industry with Z certification" and you get a different answer set. Often a completely different one. A measurement programme that does not reflect this variation is not measuring your AI visibility. It is measuring the tool's generic prompt library. The variation across prompt formulations is not noise to be averaged away. It is the measurement signal. Prompt design is the instrument calibration step for the entire programme.

What good looks like

A serious prompt set covers at minimum three axes. The range of questions your buyers actually ask AI, what practitioners call a multi-dimensional prompt set, spans buyer journey stage, buyer persona, and application area. Each axis earns its place.

Buyer journey stage runs from early awareness through consideration, shortlist formation, preference, specification, and procurement. A buyer at the specification stage is asking something categorically different from a buyer at the awareness stage. Both are commercially relevant. Neither should dominate the prompt set at the expense of the other.

Buyer persona covers the full buying committee. The marketing director, engineering lead, procurement officer, C-suite sponsor, regulatory lead, and supply chain manager ask different questions because they have different priorities and different vocabularies. A prompt set written from a single persona's perspective misses the committee's full search behaviour.

Application area names the actual uses of your product or service in context, not the product description. A manufacturer of advanced composites does not just make composites. Their buyers are specifying them for aerospace structural components, automotive lightweighting, wind turbine blade manufacture, and industrial pressure vessels. Each application area produces distinct prompt families.

These three axes give you the structure. What gives you reliability is tracking by variants. Once you have a core question for a given journey stage, persona, and application area, you run it in variant forms to detect how sensitive your visibility is to phrasing. The specific variants that distinguish complex B2B measurement from generic brand-tracking are certification variants ("which supplier holds NADCAP accreditation for..."), phrasing variants (the same query posed as a direct question, a comparative, and a specification request), processing method variants (different production routes or service delivery models), and supply chain context variants (tier-one supplier versus subcontractor versus MRO supplier). Each variant runs three to five times per AI surface to detect consistency. Consistency itself becomes a measurement data point.

In our work across manufacturing, chemicals, and financial services clients, the most common gap we see is a prompt set that covers branded and category queries well but has no coverage of certification, regulatory, or application-specific variants at all. The measurement reports look thorough. They simply do not reflect the buyer's actual decision-relevant searches.

Checkpoint question. Does your current measurement programme cover all three axes (buyer journey stage, persona, and application area) and run variant prompts for certification, phrasing, processing method, and supply chain context? Or is it working from a generic prompt list that your measurement provider's onboarding team assembled?

The next section includes a small number of illustrative prompts to make this concrete. The full prompt library for complex industrial B2B, including the multiplication logic that produces a serious prompt set for an organisation with multiple geographies, product lines, and regulatory frameworks, is the subject of a dedicated companion article that will be linked here when published.

A few example prompts

The framework from Section 3 only becomes useful once you can see what a multi-dimensional prompt actually looks like, so here are two examples to make the contrast concrete.

Example 1: advanced materials, procurement stage

Which suppliers of PEEK polymer are recommended for medical device component manufacturing, and which are preferred by procurement teams requiring ISO 13485 certification?

This prompt names an application area (medical device components), a certification requirement (ISO 13485), and a procurement-stage context. The same product, PEEK polymer, would generate a materially different prompt for a buyer at the design-engineering stage: Which PEEK grades perform best for implantable devices at continuous-use temperatures above 150°C? Same supplier, different decision layer, different visibility requirement.

Example 2: industrial coatings, specification stage

Which advanced coating suppliers are referenced in architectural specifications for high-corrosion marine environments in the UK and Northern Europe?

Geography, value-chain position (specifier, not end buyer), and environmental constraint are all carried in the prompt. A generic query ("Tell me about [brand] coatings") tests brand awareness in AI, not buyer-decision visibility.

That is the contrast your measurement programme needs to make visible: not whether your brand appears when asked, but whether it appears in the prompts your buyers are actually running.

For a deeper prompt library, see our companion guide to AI visibility prompt design.

The six metrics that actually answer the three questions

Six metrics, three questions: here is the mapping before the walk-through.

Metric	Executive question it answers
Share of answers	Do we have a problem?
Third-party mention	Do we have a problem? / How big is it?
Information correctness	How big is the problem?
Recommendations	Are we making progress?
Multi-surface tracking	How big is the problem?
Multi-run consistency	Are we making progress? / Is our reading reliable?

Each metric below follows the same structure: the failure pattern first, why it breaks, what good looks like, and a checkpoint question.

This is the proportion of AI answers in your tracked prompt set that contain a brand reference. "Share of model" is jargon: the observable thing is share of answers: how often your brand appears in the answers your prompt set generates.

Failure pattern: teams measure only branded prompts, queries a tool pre-loaded because they contain the brand name. The prompt set is too narrow, and the metric reflects a biased sample.

Why it breaks: branded prompts miss the stages where absence matters most. Buyers searching for applications, certifications, or processing solutions are not typing the brand name; they are asking open category questions. Most dashboards are blind to this.

What good looks like: share of answers across the complete prompt set, broken out by buyer journey stage and persona, not collapsed to a single score.

Checkpoint question: Does your measurement report share of answers across the full prompt set, or only on the branded prompts your tool defaulted to?

Metric 2: Third-party mention

Your brand appears in the AI answer, but the sourcing goes to a distributor, aggregator, or trade publication rather than your own domain. Third-party pages often have simpler HTML than dense product datasheets, so AI systems reach for them first.

Failure pattern: measurement counts brand mentions as positive visibility. The tool reports green. The information came from a page the brand does not control.

Why it breaks: third-party sourcing means accuracy, recency, and recommendation framing are managed by someone else. In advanced materials and chemicals, a distributor's simplification of a product specification can persist in AI answers long after the specification has changed.

What good looks like: measurement that distinguishes first-party from third-party citations in every AI answer, and tracks the trend so increasing third-party sourcing triggers a content intervention before it becomes an accuracy problem.

Checkpoint question: Does your measurement tool distinguish when the AI answer cites your site from when it cites a third-party site, or does it simply count brand mentions?

Metric 3: Information correctness

Your brand is visible in AI answers, but the information returned is wrong: incorrect product specifications, outdated certifications, inaccurate application guidance.

Failure pattern: measurement counts the mention, files it as positive, and moves on. No one has tested whether the returned information is accurate.

Why it breaks: AI systems learn from training data and web content. If the most cited sources for your product category carry stale information, the AI returns that, sometimes confidently. In regulated sectors, an incorrect AI answer can cost a qualification stage or a procurement shortlist entry.

What good looks like: measurement that tests a sample of AI answers for factual accuracy, checking product specifications, certification claims, and application descriptions, and flags incorrect returns for content intervention.

Checkpoint question: When did your team last check whether AI answers about your products return accurate technical specifications and certifications?

Quick check

Apply these three questions to your current measurement dashboard before continuing:

How does your dashboard define and report share of answers: is it broken out by prompt type or aggregated into a single score?
Does your measurement distinguish when the AI answer cites your site from when it cites a third-party site?
Have you tested whether the AI returns factually correct information about your products, applications, and certifications?

If you cannot answer all three with confidence, metrics 1–3 are measuring less than they appear to.

Metric 4: Recommendations

The AI answer references your brand but does not recommend it. You appear as "one of several suppliers", present but not shortlisted. Mention is not recommendation. At specification stage, that distinction is the commercial value of AI visibility.

Failure pattern: measurement counts mentions without distinguishing recommendation from reference. Both score as positive visibility.

Why it breaks: AI answers carry an implicit hierarchy. Encyclopaedic reference ("Brand X manufactures Y") and active recommendation ("Brand X is the right option for this application because...") are different returns. Measuring both as equivalent misses the commercially significant signal.

What good looks like: measurement that categorises answers as mention-only, mention-with-context, or active recommendation, tracked by buyer journey stage, so the team can see where recommendation rate is lagging.

Checkpoint question: Does your current measurement distinguish between AI answers where your brand is mentioned in passing and answers where it is actively recommended?

Metric 5: Multi-surface tracking

ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews each run on different model weights, different training data recency, and different citation sourcing patterns. A brand can be well-cited on Perplexity and absent from Google AI Overviews in the same week.

Failure pattern: the measurement programme runs on one surface and reports it as brand-level AI visibility. It is one surface out of five that matter for B2B buyers.

Why it breaks: surface-specific measurement produces surface-specific intelligence. If buyers are researching on Perplexity and your measurement programme runs on ChatGPT, you are not capturing the exposure that matters. Google AI Overviews operate on different selection criteria from LLM-native surfaces: a brand that appears in Google AI Overviews for a target keyword is being recognised as authoritative across the AI stack in a distinct way from LLM citation. Track them separately. Research from Digital Bloom's 2025 LLM visibility report found that only 11% of domains are cited by both ChatGPT and Perplexity. Platform divergence is real, and aggregate share of voice across tools masks critical differences.

What good looks like: the complete prompt set run across all five surfaces, with share of answers, third-party mention, information correctness, and recommendation rate tracked per surface.

Checkpoint question: Does your measurement programme track visibility across all five major AI surfaces (ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews), or only on one or two?

Metric 6: Multi-run consistency

Generative AI answers are probabilistic. The same prompt run three times on the same surface on the same day can return three different answers, with different brands and different recommendation framing. A single measurement run gives you one sample from that distribution, not a stable metric.

Failure pattern: the measurement programme runs each prompt once per cycle and treats the result as a reliable signal. This is standard instrumentation in most off-the-shelf AEO tools.

Why it breaks: when a single-run result shifts between cycles, you cannot distinguish genuine visibility change from natural answer variation. Content investment decisions may be responding to measurement noise, not actual brand performance.

What good looks like: each prompt run three to five times per cycle, results treated as a distribution, and consistency rate reported alongside the point estimate. High consistency earns confidence in the reading. Low consistency flags prompt sensitivity worth investigating before acting on the headline number.

Checkpoint question: Does your measurement run each prompt multiple times and report consistency rate alongside the headline metric, or is it running each prompt once and treating that single result as the measurement?

The unique challenge of complex industrial B2B

The six metrics above assume a prompt set built for your actual business. The problem most complex industrial organisations face is that no off-the-shelf tool is capable of running one.

Most off-the-shelf answer engine optimisation (AEO) tools will give you a reading. The problem is not that the reading is wrong; it is measuring a sliver of your business and presenting it as the picture.

Failure pattern

Organisations in manufacturing, specialty chemicals, advanced materials, and financial services buy an entry-tier AEO tool, typically at around $99/month, covering 50 tracked prompts, and use it to report AI visibility to the board. The numbers are technically real. But 50 prompts cannot cover the complexity of a business with multiple product lines, multiple geographies, multiple application areas, multiple buyer personas, and certification and regulatory variants that differ across jurisdictions. What they are measuring is one corridor in a building with 40 floors.

I have observed this pattern across our engagements in manufacturing, chemicals, advanced materials, and financial services since Q4 2024. The organisations that believed they were measuring AI visibility at scale were almost always running a prompt set sized to fit a third-party tool's entry pricing, not sized to the actual complexity of their sector.

Why it breaks

The prompt set that serious AI visibility measurement requires multiplies across dimensions. From Section 3: buyer journey stage, buyer persona, application area, geographic market, and certification variant are all independent axes. A business with six geographies, six product lines, three application areas per product, four buyer personas, and two or three certification variants per geography reaches several hundred prompts per measurement cycle without any redundancy built in. Most entry plans cover 50 prompts at around $99/month. Reaching 200 prompts (still well below full coverage for a complex industrial business) costs $400–$550/month at growth tier pricing across the major tools. (Semrush, 2026) Measurement at real complex-B2B scale, 500 to 1,000 or more prompts per cycle, breaks the pricing model of every major third-party tool. (For a full breakdown of third-party tool pricing and capability tiers, see our guide to AI visibility tools.)

What good looks like

The readout the executive actually needs is a high-level matrix structured by application area, buyer journey stage, and buyer persona, showing visibility rate, recommendation rate, and third-party mention rate for each cell. This is the synthesis that answers the three questions. Which cells are below threshold tells you whether there is a problem. How many cells and at what severity tells you how big it is. Whether the below-threshold cells are moving over time, across consecutive measurement cycles, tells you whether you are making progress.

When you present this matrix to the board, translate the metrics into board language. Current share of answers versus named competitors frames the gap as a competitive problem, not a technical one. Trend direction over 30 and 90 days (whether the gap is narrowing or widening) is the trajectory the board responds to. One structural root cause with a fix timeline, not a list of optimisation tasks, completes the three numbers the board needs to act. "Share-of-trust in AI-mediated buyer research" lands better than "brand citation frequency." Competitive position and commercial consequence are the translation layer.

To measure AI search presence at this scale, Graph uses Katelyn, our proprietary AI marketing platform. Katelyn was cited in Perplexity for AI visibility queries in its first-party AEO audit in April 2026, which matters not just as a credential but as a proof of concept: proprietary collection at the prompt volume complex industrial B2B requires is economically viable when the infrastructure is built for it. Third-party tools are not built for it; their pricing architecture is not compatible with the prompt volume the matrix above demands.

Our forthcoming industrial AI search study, the AI Visibility Report 2026, is the first-party citable research base for the measurement patterns and sector benchmarks described in this guide. Running measurement at scale across industrial sectors is what makes that research possible.

Checkpoint question

Does your current AI visibility measurement programme cover the full application-area, buyer-persona, and geographic-market complexity of your sector, or is it running a prompt set sized to fit within an entry-tier AEO tool's pricing?

If the answer is the latter, you are not measuring AI visibility. You are measuring a prompt budget.

When you need help with this

If Sections 5 and 6 have surfaced a gap in your current measurement programme, there are two credible next steps. Which one is right depends on your scale.

This is gap-mapping, not a pitch. The two destinations are distinct, and the routing rule is clear enough to apply yourself.

Get a free AI Visibility Audit

The free AI Visibility Audit is the right first step if you want your first reading on the practitioner-grade measurement framework applied to your brand.

The diagnostic runs the six metrics across a representative prompt set for your sector and returns a measurement gap assessment: which metrics your current programme is tracking, which it is missing, and where the gaps carry the most commercial consequence. That is a concrete finding, not a list of things to buy. Without it, the next board conversation on AI visibility returns the same unanswered questions.

Delivery is within 48 hours of submission. No prep required: a URL is enough to start.

The Audit is the correct route if you have not yet confirmed whether your current measurement programme can answer the board's three questions. Most organisations at this stage discover they track one or two metrics, usually share of answers in isolation, with no consistent methodology for prompt design, multi-surface coverage, or multi-run consistency.

AI visibility accelerator

Katelyn is our proprietary AI marketing platform, built for measurement at the prompt volumes complex industrial B2B sectors require.

The pattern described in Section 6, where your sector's application areas, buyer journey stages, certification variants, and processing levels require 200 or more prompts per cycle to get a commercially meaningful reading, is the scale signal that routes here. At that volume, third-party tool economics break down. Entry-level plans cover 50 prompts; reaching 200 costs £400 to £550 per month through growth-tier pricing. Proprietary collection at that scale becomes necessary, without the per-prompt ceiling.

The routing rule

One question determines the route: based on Section 6, does your sector's complexity mean you need 200 or more prompts per measurement cycle to get a commercially meaningful reading?

If you are not yet sure, start with the Audit. It gives you the first reading on the six-metric framework. If you recognised your measurement scale in Section 6, that route gives you the ongoing measurement programme.

Key takeaways

AI is a black box: measuring AI visibility is structured probing of what AI systems surface for specific buyer prompts, not retrieval of a pre-existing metric.

Share of answers — not "share of model" — is the foundational metric: it measures how often your brand appears in AI-generated responses to the actual prompt set your buyers use.

Six practitioner metrics answer the three executive questions: share of answers, third-party mention, information correctness, recommendations, multi-surface tracking, and multi-run consistency map directly to "do we have a problem, how big is it, are we making progress?"

Prompt-set design is the load-bearing layer: reliable measurement requires a multi-dimensional set spanning buyer journey stage, persona, and application area, with variant tracking across phrasing, certification, processing, and supply chain.

Complex industrial B2B measurement scales to hundreds of prompts per cycle; entry-level AEO tools covering 50 prompts become uneconomic well before that volume, making proprietary collection necessary.

The view that answers all three board questions is a matrix by application area, buyer journey stage, and persona — not a single dashboard score.

Frequently asked questions

What is the best metric for measuring AI visibility?

Share of answers: how often your brand appears in AI-generated responses across a defined prompt set. It must span buyer journey stages, personas, and application areas to be meaningful. A figure drawn from branded queries only tells you nothing about your real retrieval position across the full buying journey.

Use share of answers. Share of model is jargon: it implies AI systems hold a stable internal model of your brand, which they don't. What you are measuring is how often your brand appears in AI answers to your prompt set. Call it that.

How many prompts do I need for AI visibility measurement?

It depends on your sector's complexity. Simple brand and category measurement: 50 prompts. Serious B2B measurement across buyer journey stages, personas, and application areas: 200–500+ prompts. Complex industrial B2B with geographic and certification variants: 500–1,000+ prompts per cycle. Most off-the-shelf tools are built for the first scenario.

Can my SEO agency measure AI visibility properly?

Only if they run a multi-dimensional prompt set covering buyer journey stage, persona, and application area, with variant tracking for your sector. Most agencies run generic branded prompts. Ask them what the prompt set covers. That question alone will tell you whether their measurement is fit for purpose.

How is AI visibility measurement different from SEO measurement?

SEO tracks rankings in an indexed, retrievable SERP. AI visibility observes AI-generated answer outputs, which vary by prompt phrasing, surface, and run. There is no index to query. Measurement is structured probing of a system that does not publish its ranking factors.

How often should we re-run AI visibility measurements?

Run each prompt three to five times per cycle to detect consistency: single-run results are unreliable. Monthly is the minimum for tracking trend direction; bi-weekly for active optimisation. Frequency without the right prompt set is just sampling noise faster. Get the measurement foundation right first, then set the cadence.

What to do next

Run the six metrics against your current dashboard and note which ones you cannot answer with confidence. Most organisations find they have one or two, usually share of answers in isolation, with no consistent prompt methodology, no multi-surface coverage, and no consistency tracking.

Commission the free AI Visibility Audit if you want a concrete gap assessment across all six metrics for your brand, delivered within 48 hours, no preparation required, specific to your sector and buyer profile.
If your sector's complexity means you need 200 or more prompts per cycle to get a commercially meaningful reading, the appropriate next step is Katelyn, our proprietary AI content intelligence platform, built for measurement at that volume, where third-party tool pricing has already broken down.

Graph Digital runs AI visibility measurement for complex B2B organisations in manufacturing, chemicals, advanced materials, and financial services. The Audit is where most organisations start.

Stefan Finch — Founder, Graph Digital

Stefan is the founder of Graph Digital and an advisor on AI marketing for complex B2B. He works with B2B marketing directors and CMOs in mid-market companies on AI visibility, answer engine optimisation (AEO), and growth systems that connect content to pipeline and revenue.

Connect with Stefan: LinkedIn

Graph Digital is an AI-powered B2B marketing and growth consultancy that specialises in AI visibility and answer engine optimisation (AEO) for complex B2B companies. AI visibility for complex B2B →

The three questions every board asks about AI visibility

How we measure AI visibility

The failure pattern

Why it breaks

What good looks like

Checkpoint question

How to think about prompt design for AI visibility measurement

The failure pattern

Why it breaks

What good looks like

A few example prompts

The six metrics that actually answer the three questions

Metric 1: Share of answers

Metric 2: Third-party mention

Metric 3: Information correctness

Quick check

Metric 4: Recommendations

Metric 5: Multi-surface tracking

Metric 6: Multi-run consistency

The unique challenge of complex industrial B2B

Failure pattern

Why it breaks

What good looks like

Checkpoint question

When you need help with this

Get a free AI Visibility Audit

AI visibility accelerator

The routing rule

Key takeaways

Related guides

Related reading

Frequently asked questions

What is the best metric for measuring AI visibility?

Should I use share of model or share of answers?

How many prompts do I need for AI visibility measurement?

Can my SEO agency measure AI visibility properly?

How is AI visibility measurement different from SEO measurement?

How often should we re-run AI visibility measurements?

What to do next