AI Agents

How the LLM orchestrator decides what to do — and why this is not intelligence

LLM orchestrators are deterministic routing loops, not reasoning engines. Because the only variable governing tool selection is description quality, agents built without this understanding fail silently in production. No model upgrade compensates for a tool description that routes incorrectly.

Stefan Finch
Stefan Finch
Founder, Head of AI
Apr 16, 202611 min read

Discuss this article with AI

By Stefan Finch, Graph Digital | Last reviewed: April 2026

LLM-orchestrated agents and deterministic pipelines: why the architecture choice determines how you build

The LLM orchestrator pattern and the deterministic pipeline pattern are not interchangeable. LLM orchestration enables adaptive, multi-step reasoning where the model decides what to do based on what it gets back, with no pre-written recovery paths required. Deterministic pipelines are cheaper, more predictable, and correct when the required sequence is known. They become brittle the moment an unplanned exception arises.

At Graph Digital, we design and build AI systems for complex B2B: not prompting tools, but engineered multi-agent architectures that operate continuously. The gap we see consistently at production stage is organisations applying LLM orchestration to workflows that would be faster, cheaper, and more reliable as deterministic pipelines. The architecture choice is the first decision, and most teams make it without the information they need.

Gartner forecasts that by 2028, AI agents will intermediate more than $15 trillion in B2B purchasing. The architectural decisions being made in 2026 will determine whether those agents survive production or become expensive demonstrations.

Think of the distinction as pinball versus pipeline. A deterministic pipeline is a production line: A to B to C to gate to D, every step defined, every output predictable. An LLM-orchestrated agent is a pinball machine: the ball enters, bounces off whatever it encounters, and the machine routes it based on what happens in real time. You designed the bumpers. You did not pre-define the path.

This is part of our AI product and agent systems practice, specifically agent orchestration and build architecture.

LLM orchestration handles the unknown. Deterministic pipelines handle the known.

What is an LLM orchestrator? The tool-use loop that drives every modern AI agent

An LLM orchestrator is not a special piece of code. It is a pattern: the pattern by which a language model is given tool definitions and asked to decide, repeatedly, what to call next. The reasoning appears to come from the model. The wiring is deterministic.

Here is the exact mechanism, as implemented in the Anthropic tool_use API and its equivalents (OpenAI function calling, Google Gemini function declarations):

1. You send:    messages + tool definitions (JSON schema per tool)
2. LLM returns: stop_reason: "tool_use"
                tool_use: { name: "get_gsc_data", input: { site: "..." } }
3. You execute: your service runs (C#, Python, .NET — any language)
4. You send:    same messages
                + assistant turn (the tool_use block)
                + user turn (the tool_result block)
5. LLM returns: another tool_use — repeat from step 3
                OR stop_reason: "end_turn" — stream the answer

The LLM never calls your code directly. It returns a JSON instruction. Your code executes the tool. Your code returns the result. You loop until end_turn. That is the entire mechanism.

If you need a foundation-level explanation before the mechanism, see what AI agents are and how they differ from chatbots.

Tool description quality is the only routing variable

The LLM picks tools based on your description text, and nothing else. get_data routes nowhere. get_gsc_performance_data - returns trailing 90-day Google Search Console performance data including clicks, impressions, and top queries for a given domain routes correctly, every time.

This is The Routing Reality in practice. The adaptability of an LLM-orchestrated agent comes from the model's training applied to tool descriptions you write. Poor descriptions produce wrong calls. Good descriptions produce correct routing. The model contributes no judgment about which tool to use. It reads what you wrote. Tool description quality is the highest-leverage engineering decision in any agent build, ahead of model selection, framework choice, or prompt engineering.

The agentic loop: what multi-step tool calling looks like in practice

Consider a request: "Find opportunities from my GSC data, then draft a content plan for the best one." The LLM reads the request, calls get_gsc_data, reads the result, calls analyse_opportunities, reads those results, calls content_plan with the highest-scoring opportunity as input, reaches end_turn, and streams the plan.

You did not define that sequence in advance. You gave the agent tools and a clear description of each one. The LLM decided what to call, in what order, based on what it received at each step.

Katelyn, Graph Digital's production-deployed multi-agent AI platform, runs on exactly this pattern. It orchestrates agents to continuously run content operations, mapping commercial surface quality and executing targeted improvements. It has operated continuously since January 2026, running multiple pipeline workers daily. The tool-use loop described above is the architecture underneath it.

What is a deterministic pipeline? The A-to-B-to-C model and where it breaks

A deterministic pipeline is a sequence where every step, every branch, and every error path is written in code before the system runs. It is predictable, auditable, and cost-efficient. It fails silently when reality does not match the author's assumptions.

The pattern is A to B to C to gate to D. Each step has defined inputs, outputs, and error handling. The pipeline author must anticipate every exception. If step B produces a result that conflicts with what step D needs, you do not find out until step D. Recovery requires bespoke error-handling code written in advance: code that only works for the exceptions you imagined.

This is not a weakness in every context. For many production workflows, it is exactly the right design.

When deterministic is the right choice, and when it becomes brittle

Deterministic pipelines are correct when the sequence is fixed and fully known in advance, when every exception path is enumerable, when steps must be auditable for regulatory or compliance reasons, or when volume is high enough that an LLM call per step would be cost-prohibitive.

They become brittle when the required next step depends on what a previous step returned, when edge cases are not enumerable, or when the system needs to recover from unexpected results without a human rewriting the error handler.

What LLM orchestration enables, and what deterministic pipelines still do better

LLM orchestration enables AI systems to handle the unknown: adaptive sequencing, recovery without pre-written error paths, and multi-step tasks where the required sequence is not fully knowable in advance. Deterministic pipelines remain superior for batch processing, compliance workflows, and any sequence that must be fully auditable.

What LLM orchestration makes possible in practice:

Adaptive sequencing. The agent decides what to fetch based on what it receives. A content research agent that discovers a competitor has published a new article can pivot its plan without a human rewriting the workflow.

Recovery without pre-written error paths. When a tool returns a typed error code, the LLM can reason about what to do next: re-run with different parameters, call a different tool, or escalate to a human gate. None of this needs to be pre-coded.

Multi-step tasks where the sequence cannot be specified in advance. An AI-assisted investigation into a compliance question, where the next question depends on what the evidence shows, cannot be expressed as a deterministic pipeline.

Consider a UK financial services firm processing two thousand claims per day. Standard triage is a deterministic pipeline: fast, cheap, auditable, every exception path written. Complex edge cases go to an LLM-orchestrated agent that gathers additional context, reasons about it, and escalates appropriately. The same production system uses both patterns, each in the conditions it was designed for.

LLM orchestration, deterministic pipeline, or hybrid: the decision table

Neither pattern is universally correct. The choice depends on whether the required sequence is knowable in advance, how expensive prompt calls are relative to your volume, and what happens when a step returns an unexpected result.

ConditionUse
Sequence is fixed and known in advanceDeterministic pipeline
Steps must be fully auditable or compliantDeterministic pipeline
Volume is high, prompt cost matters at scaleDeterministic pipeline
Sequence depends on what earlier steps returnLLM orchestration
Recovery from unexpected results is requiredLLM orchestration
Task is multi-step with open-ended reasoningLLM orchestration
Fixed outer sequence with exception handlingHybrid: deterministic outer, LLM inner
Compliance requirement plus adaptive edge casesHybrid

Three decision signals cut through most situations. If your workflow has a fixed sequence and all exceptions are enumerable, use a deterministic pipeline. If the required next step depends on the result of the previous step, use LLM orchestration. If you need both auditability and adaptive recovery, use the hybrid pattern: hard gates enforced in code, LLM orchestration within each gate.

The most common architecture mistake is over-applying LLM orchestration to workflows that would be faster, cheaper, and more reliable as deterministic pipelines. Sophistication of architecture is not the measure of its quality. Whether it works reliably in production is.

Cost, complexity, and latency: what LLM orchestration actually adds

LLM-orchestrated agents carry higher per-run cost than deterministic pipelines. Each iteration of the tool-use loop requires an LLM call, adding prompt token cost and latency per iteration. At low volume, this is rarely the deciding factor. At production scale, it shapes the entire architecture.

LLM orchestration has a higher marginal cost per run. Deterministic pipelines have a higher fixed cost to build and maintain complete error path coverage: every exception must be anticipated and coded. Volume and exception frequency determine which is cheaper in practice. Low volume with unpredictable exceptions favours LLM orchestration. High volume with enumerable exceptions favours a deterministic pipeline.

Latency is a separate constraint. Each LLM decision in the tool-use loop adds time. Context window management becomes an engineering concern at high turn counts, as the conversation history grows with each tool call. For latency-sensitive user-facing applications, this matters from the first sprint.

The bear traps: why LLM-orchestrated agents fail in production

The most common failures in LLM-orchestrated agents are not model quality problems. They are engineering problems: poor tool descriptions, untyped errors, and orchestrators doing work that should live in their tools.

Bear trap 1: poor tool descriptions mean the LLM routes on what you tell it

The LLM cannot call the right tool if the description is vague. get_data tells the model nothing useful. get_gsc_performance_data - returns trailing 90-day Google Search Console performance data including clicks, impressions, and top queries for a given domain routes correctly, every time.

Every hour spent refining tool descriptions is more valuable than any time spent on model selection or framework choice. This is the highest-leverage engineering decision in an LLM-orchestrated agent build.

Bear trap 2: prose errors mean the pinball machine has no bumpers

If a tool returns "Something went wrong, please try again," the LLM cannot route on that. It will either hallucinate a recovery path or stop entirely.

Typed error codes are the bumpers the pinball bounces off. { "status": "fail", "code": "SPEC_COLLISION" } tells the LLM what happened and what it can do next. { "status": "fail", "code": "PERSONA_DRIFT" } signals that output has strayed from the intended framing and needs correction. The LLM can route on a typed code. It cannot route on a paragraph.

As of April 2026, using the Anthropic tool_use API or equivalent: a team that ships typed JSON error codes can build automated recovery loops that are architecturally impossible for a team using prose errors. Not slower or harder. Architecturally impossible by design. The LLM cannot route on "something went wrong." It can route on { "status": "fail", "code": "RATE_LIMIT", "retry_after": 30 }.

Typed error codes are not documentation. They are the prerequisite for any recovery loop.

Bear trap 3: no turn limits mean the agent calls the same tool repeatedly

Without explicit turn limits and clear end conditions, an LLM-orchestrated agent can loop. A model with no typed error code to route on and no explicit max_iterations guard will call the same tool again, and again, until context window exhaustion.

The fix is explicit: set a max_iterations guard on every agent, define end_turn criteria, and build a typed escalation path for errors that cannot be resolved within the turn limit.

Bear trap 4: fat orchestrators do work that should live in the tools

An orchestrator that pre-processes data before passing it to tools, transforms results after receiving them, or maintains state across tool calls is doing work that should live in stateless workers. This creates context coherence failures at scale. The orchestrator accumulates complexity that compounds across sessions and makes debugging near-impossible.

The correct pattern is a thin orchestrator: it decides what to call and in what order. It does not transform data. It does not maintain state.

Five production principles that separate deployed agents from expensive demos

The production pattern that survives is not the most sophisticated. It is the most disciplined. Five principles separate production-grade LLM-orchestrated agents from impressive prototypes.

1. Thin orchestrators. The orchestrator decides what to call and in what order. It does not transform data. It does not maintain state. It routes. Complexity that lives in the orchestrator is complexity that compounds.

2. Typed error codes. Every tool returns a typed result or a typed error. No prose. { "status": "fail", "code": "TOOL_TIMEOUT" }, not "something went wrong". Typed codes are the prerequisite for recovery loops.

3. Stateless workers. Every tool receives everything it needs in the call: explicit typed parameters, nothing implicit. No reliance on earlier turns. Context coherence is the orchestrator's responsibility; the tool's job is to do one thing well.

4. Explicit end conditions. Define max_iterations. Define end_turn criteria. Build a typed escalation path for errors that cannot be resolved within the turn limit.

5. Hard gates stay as code. Non-negotiable sequences are enforced by the pipeline controller, not negotiated by the LLM orchestrator. The orchestrator reasons within a gate. It does not decide whether to skip one.

Stefan Finch mandated typed error codes across all Katelyn workers after diagnosing a prose-error loop failure in February 2026. A worker returning "something went wrong" caused the orchestrator to loop without exit until manual intervention. The fix: replacing all prose error returns with structured JSON error codes. After that change, zero uncontrolled loops have occurred. This is not a metric of model improvement. The model did not change. It is a metric of architectural correction: typed codes enabled the recovery loop the orchestrator needed.

These five principles are not aspirational. They are the design decisions that distinguish production-grade agents from impressive demos.

What to look for in a partner building LLM-orchestrated agents for complex B2B

The most important signal when evaluating an AI development partner is not their model expertise. It is whether they have built and maintained production systems under real load, and whether they can articulate precisely the difference between a well-constructed agent architecture and an impressive prototype.

Five questions worth asking before you engage:

Can they explain the difference between LLM orchestration and a deterministic pipeline? A partner who uses the terms interchangeably has not thought carefully about what they are building. The distinction determines architecture, cost, failure modes, and governance.

Do they use typed error contracts in their agent tools? If not, their agents will fail silently in production. Typed codes enable recovery loops. Prose errors stop them.

Have they built under real volume and latency constraints? Lab performance and production performance diverge significantly. Context window management and prompt cost at scale are constraints that only become visible under load.

Can they describe their orchestrator design in one sentence, and is it thin? A partner who can do this has thought about it. One who needs a paragraph of capabilities probably has not.

Do they have an evaluation harness? Intent classification accuracy, tool call precision, multi-turn coherence across five or more turns, graceful degradation. If there are no evals, there is no way to know whether the system is working.

These five questions are the starting point of our AI Build Readiness Assessment — a structured diagnostic we run before any agent build to confirm the architecture, team, and problem are matched correctly.

Find out whether your build is ready with our AI Build Readiness Assessment (Phase 1) — architecture review, problem fit, and production readiness in a single structured session.

Key takeaways

An LLM orchestrator is a deterministic loop, not a reasoning breakthrough. The LLM reads tool definitions, decides what to call, receives results, and loops until end_turn. The wiring is deterministic. The adaptability comes from the model's training and the quality of the tool descriptions you provide.

Tool description quality is the only routing variable. The LLM picks tools based on description text, and nothing else. Poor descriptions produce wrong tool calls. This is the highest-leverage engineering decision in any agent build, ahead of model selection, framework choice, or prompt engineering.

Typed error codes are a prerequisite for recovery loops, not documentation. An agent that receives a prose error cannot route on it. An agent with typed error codes can re-run, escalate, or request clarification. Typed codes are the mechanism that enables agentic recovery: the bumpers the pinball bounces off.

Not every workflow needs LLM orchestration. Fixed-sequence, high-volume, auditable workflows are faster, cheaper, and more reliable as deterministic pipelines. Over-applying LLM orchestration is an architecture mistake, not a sophistication signal.

The production pattern is thin orchestrators, stateless workers, explicit turn limits. Stability under real load is the goal, not the most complex reasoning chain. The agents that survive production are the ones built with discipline.

The gap between a working prototype and a production agent is architectural. Most demo-stage agents have fat orchestrators, prose errors, and no turn limits. Applying the five production principles in this article is the difference between a demo and a deployed system.

FAQ

What is an LLM orchestrator in AI agents? An LLM orchestrator is the component in an AI agent that receives tool definitions, decides which tool to call based on the current task, executes the tool-use loop, and continues until the task is complete. The LLM does not execute tools: your code does. The LLM reads results and decides what to call next. The intelligence is in the model's training; the wiring is a deterministic loop around that probabilistic model.

What is the difference between LLM orchestration and a deterministic pipeline? LLM orchestration allows an AI agent to adapt its sequence based on what each tool returns. It can re-run steps, call tools in a different order, or handle unexpected results without pre-written recovery code. A deterministic pipeline executes a fixed A-to-B-to-C sequence where every error path must be written in advance. LLM orchestration handles the unknown; deterministic pipelines handle the known. Most production systems use both.

When should I use LLM orchestration vs a deterministic pipeline? Use LLM orchestration when the required next step depends on what a previous step returns, when recovery from unexpected results is needed, or when the task requires open-ended multi-step reasoning. Use a deterministic pipeline when the sequence is fixed, every exception path is enumerable, the workflow must be fully auditable, or volume makes per-step prompt cost prohibitive. Most production systems need both patterns.

Why do LLM-orchestrated agents fail in production? The most common production failures are engineering problems, not model problems: poor tool descriptions that route incorrectly; prose errors that the LLM cannot route on; no explicit turn limits leading to infinite loops; and fat orchestrators that carry state and transform data, creating context coherence failures at scale. These are architectural decisions. A better model will not fix them.

What are typed error codes in AI agent architecture? Typed error codes are structured, machine-readable error responses from tool workers: for example, { "status": "fail", "code": "SPEC_COLLISION" } rather than "something went wrong." The LLM orchestrator can route on a typed code: retry with different parameters, escalate to a human gate, or call a different tool. It cannot route on a prose error. Typed error codes are the prerequisite for any automated recovery loop.

What does "The Routing Reality" mean in LLM orchestration? The Routing Reality is the architectural fact that an LLM orchestrator routes tool calls based on tool description quality, and nothing else. When an AI agent "decides" to call a tool, the LLM reads your tool descriptions and returns a JSON instruction. It applies no judgment about which tool is correct beyond what you wrote in the description. Teams that understand The Routing Reality build production-grade agents. Teams that treat orchestration as model intelligence build systems that fail when tool descriptions degrade under real load.


Stefan Finch — Founder, Graph Digital

Stefan Finch is the founder of Graph Digital, advising leaders on AI strategy, commercial systems, and agentic execution. He works with digital and commercial leaders in complex B2B organisations on AI visibility, buyer journeys, growth systems, and AI-enabled execution.

Connect with Stefan: LinkedIn

Graph Digital is an AI-powered B2B marketing and growth consultancy that specialises in AI visibility and answer engine optimisation (AEO) for complex B2B companies. AI strategy and advisory →