How the LLM orchestrator decides what to do — and why this is not intelligence

LLM orchestrators were designed for token prediction, not decision-making. Because the tool-use loop is deterministic wiring around a probabilistic model, teams routinely over-apply orchestration to workflows that should be deterministic pipelines. The cost: systems fail silently, loop without exit, and cannot recover when tool descriptions degrade.

Stefan Finch

Founder, Head of AI

Apr 18, 2026

Discuss this article with AI

What is an LLM orchestrator?

Most organisations evaluating AI agent builds ask the wrong question: which framework, which model, which provider. At Graph Digital, working with complex B2B organisations on AI agent development, the question that determines whether a production system holds together is architectural. LLM orchestration and deterministic pipelines are not competing choices. They are complementary tools for different classes of problem. Choosing correctly determines whether an agent survives production or becomes an expensive demonstration.

An LLM orchestrator is a pattern. Not a framework, not a special piece of code. It is a system in which a language model receives tool definitions and decides, repeatedly, what to call next. The wiring is deterministic. The apparent reasoning is the model's training applied to the text of tool descriptions the developer writes.

Across production-grade AI agent systems for complex B2B organisations, the tool-use loop operates identically across all major LLM APIs. The mechanism is a five-step sequence:

The calling application sends messages and tool definitions: a JSON schema per tool
The LLM returns stop_reason: "tool_use" with a named tool and structured input
The calling code executes the tool in the application layer
The calling application sends the tool result back
The LLM returns another tool_use instruction, or stop_reason: "end_turn"

The LLM never calls the codebase directly. It returns a JSON instruction. The calling code executes the tool. The calling code returns the result. The loop continues until end_turn. That is the entire mechanism.

The routing variable is tool description quality, and only that. get_data routes nowhere. get_gsc_performance_data with a full description of what it returns routes correctly, every time. The adaptability of an LLM-orchestrated agent comes from the model's training applied to description text, not from any independent judgment.

Gartner forecasts that AI agents will intermediate more than $15 trillion in B2B purchases by 2028. The architectural decisions made in 2026 will determine whether those systems survive production or become expensive demonstrations.

This is The Routing Reality. The LLM picks tools based on what a developer wrote in the description field, and nothing else. Before evaluating any AI agent architecture, it helps to understand what LLM orchestration is being compared against.

"The LLM picks tools based on what a developer wrote in the description field, and nothing else."

Stefan Finch, Founder, Graph Digital

What is a deterministic pipeline?

A deterministic pipeline is a fixed sequence: A to B to C to gate to D. Every step is pre-defined before the pipeline runs. Every exception path is written in advance. The pipeline does not adapt. It executes.

This is the correct architectural choice when the sequence is fixed and fully known, when every exception is enumerable, and when the workflow must be auditable for compliance. Or when volume makes per-step prompt cost prohibitive. A deterministic pipeline is not an inferior alternative to LLM orchestration. It is the appropriate pattern for a different class of problem.

The constraint is brittleness: when real-world load produces cases the pipeline author did not anticipate, recovery requires bespoke error-handling code written and deployed in advance. No runtime adaptation is possible.

One principle governs both patterns: hard gates and non-negotiable sequences belong in code, enforced by the pipeline controller. An LLM orchestrator reasons within a gate. It does not decide whether to skip one. That boundary, between what the LLM chooses and what the pipeline enforces, is the architectural line that separates production-grade systems from brittle demonstrations.

Katelyn, Graph Digital's production-deployed multi-agent AI platform, uses both patterns: LLM orchestration within defined stages, deterministic sequencing between them. The decision between the two is the first architectural choice any production build must resolve.

LLM orchestration vs deterministic pipelines

The distinction is not sophistication. It is fitness for problem type.

LLM orchestration handles sequences where the correct next step cannot be determined until a previous step returns a result. The model reads the output of one tool, decides what to call next, and routes accordingly. The developer designed the system's capabilities: the tool definitions, the error contracts, the turn limits. Not the execution path. A useful analogy is the pinball machine. The developer built the bumpers. The ball finds its own route through them every time it runs.

A deterministic pipeline is a production line. Every step is defined. Every output is predictable. No runtime decision-making is required.

Most production systems require both. The correct architecture is a fixed outer sequence with LLM orchestration operating within defined stages. Non-negotiable steps are enforced in code. Adaptive sub-tasks are delegated to the LLM orchestrator within those boundaries.

The decision table:

Use LLM orchestration when	Use a deterministic pipeline when
Next step depends on previous result	Sequence is fixed and fully known
Recovery path is open-ended	Every exception is enumerable
Multi-step reasoning is required	Full auditability is required
-	Volume makes per-step prompt cost prohibitive

For a Finance Director evaluating an AI agent build: each iteration of the tool-use loop requires one LLM API call, adding token cost and latency. At production scale, tasks requiring 50 or more tool calls are documented across 1,200 production deployments. This calculus shapes the entire architecture. Applying LLM orchestration to workflows that should be deterministic pipelines is the most common cost error in AI agent builds.

What LLM orchestration costs in production

Every iteration of the tool-use loop is one LLM API call. Prompt tokens, completion tokens, and latency per turn accumulate. ZenML's analysis of 1,200 production deployments found that production tasks require approximately 50 tool calls on average. At that scale, token cost and cumulative latency are architecture inputs, not incidentals.

Three additional cost categories apply in production:

Observability infrastructure. Prompts, tool definitions, and policies must be versioned. Execution traces must be stored for replay and debugging. Without this infrastructure, debugging a failed multi-step agent run is impractical.

State management complexity. Session context, volatile and specific to a run, must be kept separate from task state (durable, checkpointed) and system state (policies, limits). Conflating them creates coherence failures.

Fault tolerance. A pipeline executing 30-second tool calls that fails at step 47 of 50 cannot be retried from the start without significant cost. Checkpoint and resume functionality is a production requirement, not an optimisation.

LLM orchestration earns its cost only where adaptive sequencing is required: where the correct next step cannot be known until the previous step returns. Where sequences are fixed, deterministic pipelines deliver the same outcome at lower cost and lower complexity. The architectural decision is not a preference. It is a cost calculus.

Why LLM-orchestrated agents fail in production

Most production failures in LLM-orchestrated agents are engineering problems, not model problems. Upgrading to a better model does not fix them. They are structural. Each of the four patterns below recurs across production agent builds.

Vague tool descriptions: the LLM cannot route to a tool it cannot distinguish

get_data tells the model nothing. The LLM selects tools based on the text of their descriptions. That is the entire routing mechanism. A full description of what a tool returns routes correctly, every time. The Routing Reality applies: description quality is the only routing variable. Vague descriptions produce misdirected tool calls that no model upgrade resolves.

Prose error returns: unstructured errors cannot be routed on

A tool returning "something went wrong" leaves the LLM with no actionable information. Typed error codes: { "status": "fail", "code": "RATE_LIMIT", "retry_after": 30 }. These tell the model exactly what happened and enable automated recovery: re-run, escalation, or alternative routing. Prose errors enable none of these.

No explicit turn limits: the only exit is context window exhaustion

Without a max_iterations guard, an agent encountering an unresolvable error loops indefinitely. Context window exhaustion is the termination condition. Expensive and uncontrolled. Explicit turn limits and a typed escalation path for unresolved errors are basic production requirements.

Fat orchestrators: complexity in the orchestrator compounds at scale

An orchestrator that pre-processes data, transforms results, or maintains session state accumulates complexity that creates coherence failures at scale. The correct pattern is a thin orchestrator: it decides what to call and in what order. Nothing else.

In February 2026, Graph Digital mandated typed error codes across all agent builds after a prose-error loop failure required manual intervention. Following the change, with the model unchanged, uncontrolled loops dropped to zero. The failure was architectural. The fix was architectural.

Five production principles for LLM-orchestrated agents

These are not aspirational design goals. They are the threshold between a demo-grade agent and one that operates reliably in production.

Thin orchestrators: the orchestrator decides what to call and in what order

Data transformation, result processing, and session state do not belong in the orchestrator. Complexity there compounds. Coherence failures at scale follow.

Typed error codes: every tool returns structured JSON error codes, not prose

{ "status": "fail", "code": "SPEC_COLLISION" } tells the orchestrator what happened. It enables re-run, escalation, or alternative routing. Prose errors enable none of these. Typed error codes are an architectural requirement, not a documentation preference.

Stateless workers: every tool receives all context it needs in the call

Nothing persists between tool calls. Workers that carry implicit state between invocations break in multi-agent systems and cannot be reliably replayed.

Explicit turn limits: max_iterations guard with a typed escalation path

Without a maximum iteration count and a defined escalation route for unresolved errors, the only exit from an error loop is context window exhaustion. Both controls are production prerequisites.

Hard gates in code: non-negotiable sequences enforced by the pipeline controller

The LLM orchestrator reasons within a gate. It does not decide whether to skip one. Sequences that must execute without exception are enforced in code, not negotiated by the model.

Katelyn, Graph Digital's production-deployed multi-agent AI platform, applies all five principles. Operating continuously since January 2026, running multiple pipeline workers daily: 50% increase in new users to high-intent pages, 440% conversion increase. The principles are not theoretical. They are the architecture of a system in production. A partner building LLM-orchestrated agents should be able to demonstrate each of them.

What to look for in a partner building LLM-orchestrated agents

Capability claims and technology logos do not distinguish demo-grade from production-grade. Five questions do.

Can they show a production-deployed agent: not a demo, not a proof-of-concept, but a system operating under real load?
Do their tool contracts use typed error codes or prose errors? A partner who returns prose errors is building to demo standard.
Can they explain when not to use LLM orchestration for a given workflow? The ability to recommend a deterministic pipeline over an LLM orchestrator is a mark of architectural discipline.
Are their workers stateless, or do they carry session context between calls? Stateful workers break in multi-agent systems.
Can they describe how they enforce hard gates: in code, or via the LLM? The answer should always be: in code.

A partner who cannot answer these questions confidently is building to demo standard, not production standard.

"A partner who cannot answer these questions confidently is building to demo standard, not production standard."

Stefan Finch, Founder, Graph Digital

An AI Build Readiness Assessment (Phase 1) surfaces the architectural gaps before they become production failures. Graph Digital's AI product and agent development practice works with complex UK mid-market B2B organisations designing and building LLM-orchestrated agents that operate reliably under real load.

Before you build:

Audit every tool description against the Routing Reality: vague names route to nothing, specific descriptions route correctly every time
Mandate typed JSON error codes from every tool before production deployment; prose errors produce uncontrolled loops that a better model cannot fix
Map which stages of your workflow need LLM orchestration and which should be deterministic pipelines: most production systems need both

Frequently asked questions

What is actually happening when an AI agent "decides" to call a tool?

The LLM does not decide in any meaningful sense. It receives tool descriptions, each a JSON schema, reads them, and returns a structured JSON instruction naming which tool to call and what arguments to pass. The calling code executes the tool. The result goes back to the LLM. The loop continues until end_turn. The entire routing mechanism is pattern-matching against description text. Tool description quality is the only variable that determines whether the right tool gets called.

Is an LLM orchestrator the same as a framework like LangChain or LangGraph?

No. LangChain, LangGraph, and similar tools are frameworks that implement the LLM orchestrator pattern. The pattern operates identically whether the calling code uses a framework or is written from scratch. The framework provides structure; it does not change the underlying mechanism. Evaluating agent architecture by framework choice rather than tool description quality and error contract design is where most production builds go wrong.

What is the difference between a thin orchestrator and a fat orchestrator?

A thin orchestrator decides what tool to call and in what order. Nothing else. It does not pre-process data, transform results, or maintain session state. A fat orchestrator accumulates all of these responsibilities, creating a single point of compounding complexity that produces coherence failures at scale. The correction is not a refactor. It is an architectural decision that must be made at the start of a build, not retrofitted after the first production failure.

Stefan Finch — Founder, Graph Digital

Stefan Finch is the founder of Graph Digital, advising leaders on AI strategy, commercial systems, and agentic execution. He works with digital and commercial leaders in complex B2B organisations on AI visibility, buyer journeys, growth systems, and AI-enabled execution.

Connect with Stefan: LinkedIn

Graph Digital is an AI-powered B2B marketing and growth consultancy that specialises in AI visibility and answer engine optimisation (AEO) for complex B2B companies. AI strategy and advisory →