Context engineering is the practice of deciding exactly what information an AI agent receives per task: what goes into its working window, in what form, and at what scope. It is not prompt engineering. It is not model selection. It is a data and architecture design problem that sits upstream of both, and it determines output quality before the model runs. At Graph Digital, context engineering is treated as the primary design layer in every AI product and agent development engagement. Katelyn, our production multi-agent AI platform, was built on context slice architecture as its foundational design principle, and the failures that informed that design are the same failures described here. Gartner identified context engineering as the breakout AI capability of 2026 - the capability most organisations building agent systems have not yet named as a design practice.
Most production agent failures are not model failures. Anthropic's engineering team found that the majority of production failures stem from poor context management rather than inferior language models. The model is reasoning correctly. The information going in is wrong.
Context engineering defines what an AI agent knows before it reasons
Context engineering is the design practice that governs what information enters an AI agent's context window for each task.
An AI agent does not decide what information it needs. It receives a context window - a finite working space - and it reasons over whatever is in it. The quality of that reasoning is bounded by the quality of what was passed in. Pass in too much, and the agent loses focus. Pass in too little, and the agent acts on an incomplete picture. Pass in contradictory or stale information, and the agent produces outputs that are internally coherent but wrong relative to the actual situation.
Context engineering determines agent output quality at the layer upstream of the model and the prompt.
The context window is not infinitely large, and size alone does not solve the problem. Research consistently shows that what matters is what is in the window, not how big the window is. A curated context of 128,000 tokens with fresh, relevant information outperforms an uncurated context of 200,000 tokens in which critical information is buried in noise. Context engineering is the work of making that curation a deliberate design decision, not an accident of implementation.
This practice connects to what AI agents are and how they work, because context engineering only makes sense as a design layer once the mechanics of agent reasoning are clear. Understanding why agents reason over what they receive, rather than deciding what they need, explains why undesigned context is not a minor inefficiency. It is a quality floor that no subsequent refinement can raise.
Is "contextual engineering" the same as context engineering?
The terms are used interchangeably in practice. "Context engineering" is the dominant label, used by Anthropic, Gartner, and LangChain, and refers to the practice of designing what information an AI agent receives per task. "Contextual engineering" is a variant spelling that appears in search queries but describes the same practice.
Context engineering prevents the silent failures that appear after agent deployment
Most teams building AI agents are iterating on prompts when the problem is upstream of the prompt.
The agent produces a confident, well-structured response. It follows the instruction. The logic is sound. But the output does not match the real situation. It matches the information it was given, which was incomplete, stale, or overloaded. From the outside, this looks like a model error or a prompt error. It is neither. The agent reasoned correctly over the wrong inputs.
This is the diagnostic gap that makes undesigned context expensive. A visible error - the agent crashes, refuses to respond, or produces garbled output - is debuggable. An invisible error, where the agent produces a confident response based on the wrong data, is not. The organisation continues acting on outputs it trusts, because the output looks right.
This is not a speculative failure mode. It is the pattern Anthropic identified as the primary cause of production AI agent failures: context rot, where model accuracy degrades as context length increases and critical information gets buried in noise. The agent has the right model. It has a well-written prompt. The context is wrong.
The two failure modes: context overload degrades reasoning, context underload produces incomplete outputs
Context failures take two forms. Both produce confident wrong outputs. Neither is detectable as a system error.
Context overload
Context overload occurs when the agent receives more information than it can weight correctly. The transformer architecture creates pairwise relationships between every token in the context. As context length grows, maintaining focused attention across that volume becomes mechanically harder, and information at the edges of the context window receives more attention weight than information in the middle.
This is the academic phenomenon known as the lost-in-the-middle effect: research shows accuracy drops of up to 30% for information positioned mid-context compared to information at the start or end. The practical consequence: if an agent receives a large, unstructured context pack and the critical fact is buried in the middle, the model is statistically likely to underweight it. GPT-4o's measured accuracy drops from 98.1% to 64.1% based solely on where relevant information is positioned within the context window. Model capability and prompt quality are unchanged.
Context overload also compounds through contradiction. When accumulated information contains conflicting signals - a customer record updated in one system but not another, a policy document in one version alongside a superseded version - the model must arbitrate between them. Research from Galileo AI found a 39% average performance drop across tested models when contradictory information accumulates in context across conversation turns. The model does not flag the contradiction. It produces a response.
Context underload
Context underload occurs when the agent acts on an incomplete information set. Unlike overload, the context is not too large. It is too sparse. The agent is asked to make a decision about a customer, a process, or a system state without access to the data that would make that decision correct. The output is internally consistent. It is wrong relative to reality.
Both failure modes share one characteristic: the output looks correct. The agent did not fail. It reasoned well from what it was given. The failure is upstream.
How do I know if my AI agent has a context problem?
The diagnostic signature is consistent: the agent produces outputs that are internally coherent but wrong relative to the actual situation, and the error is not flagged as an error. Common indicators include the agent acting on stale data rather than current state, and confident responses that ignore material facts present in the system but absent from the context pack. Output quality that varies by task without any pattern tied to prompt or model changes is a further signal. In each case, the agent performed its task correctly. The information it received was not.
Why prompt engineering cannot resolve a context design failure
This is the diagnostic correction most teams building agents need to make.
Prompt engineering governs how an agent is instructed. It defines what the agent should do, in what order, with what constraints. It operates inside the context window, over whatever information is already in it.
Context engineering governs what information is in the window before the prompt runs.
These are not the same problem. A precisely written prompt cannot compensate for a context that is overloaded, incomplete, or incorrectly scoped. The instruction is correct. The information environment it operates over is wrong. Switching models does not fix this either, because a more capable model reasons more effectively over the wrong information, but it still produces a wrong output.
No amount of prompt refinement or model upgrading resolves a context design failure.
This distinction matters commercially. Teams that attribute production failures to prompt quality spend weeks on prompt iteration. Teams that attribute failures to model quality spend budget on model evaluation and switching. Both are solving the wrong problem. The failure is at the context layer, which was never explicitly designed.
Context engineering operates at a different layer than prompt engineering
The distinction is architectural, not a matter of degree.
Prompt engineering works inside the context window. A system prompt, a chain-of-thought instruction, a role definition: all of these are information that goes into the context window. They govern agent behaviour. They do not govern what other information is in the window alongside them.
Context engineering works on the window itself. It is the design decision about what structured data, retrieved documents, session state, and instruction content goes into the window for a specific task, assembled at the moment the task runs.
Consider a lawyer preparing for a case. The closing argument is the prompt: precise, sequenced, persuasive. But the lawyer can only argue from the case file they were given. If the case file contains the wrong documents, the argument is irrelevant. Context engineering is the work of assembling the correct case file before the argument begins.
Most agent builds treat context as an implementation detail. Pass in the system prompt, the conversation history, and whatever data the task requires, and let the model sort it out. Context engineering treats context as a design surface: a set of explicit decisions about what goes in, what stays out, and in what form.
| Prompt Engineering | Context Engineering | |
|---|---|---|
| What it operates on | Instructions inside the context window | The information that enters the context window |
| When it runs | At agent execution, over existing context | At context assembly, before the agent executes |
| What it controls | Agent behaviour and reasoning approach | The information environment the agent reasons over |
| Who designs it | AI engineers, prompt designers | AI architects, data and platform engineers |
| Primary failure mode | Ambiguous or poorly scoped instructions | Wrong, incomplete, or overloaded information going in |
The distinction in practice: prompt engineering works within a fixed context window; context engineering controls what enters that window before the task runs.
Context slices define the information boundary for each agent task
Think of it like sitting at the boardroom table and chopping all the relevant data into one-page sheets, then passing them to the AI in an envelope. The only question that matters is what pieces went into the envelope, and whether the AI actually used them.
That envelope is what Graph Digital calls a context slice: a bounded, task-scoped information pack assembled at runtime for a specific agent task. Not a general-purpose context that serves all tasks. Not a growing conversation history that accumulates across the session. One envelope. One task. Assembled from exactly what that task requires, and nothing else.
Four design questions govern each context slice:
- What does this task require to reason correctly? Not what data is available - what data is necessary. The distinction forces specificity.
- What would degrade reasoning if included? Adjacent data, historical records, or conflicting versions that are irrelevant to this task but present in the system.
- What is the right recency scope? Some tasks require real-time state. Others require session context. Others require historical records from a defined window. The time boundary is a design decision, not a default.
- What format does the model process most reliably for this information type? Structured tables, prose summaries, key-value pairs, and retrieved document chunks each carry information differently. The format affects how the model weights and uses the content.
Context slices are equivalent to what domain-driven design calls bounded contexts: information boundaries scoped to a specific responsibility, applied at runtime to AI agent task execution. In retrieval-augmented generation (RAG) architectures, context slice assembly extends beyond document retrieval to include structured state, session context, and instruction content.
Each context slice is assembled at runtime, not pre-built at system design time. The agent task triggers the assembly; the slice is constructed from the relevant sources, scoped to the task, and passed to the model. After the task completes, the slice is not retained in the next task's window unless that task explicitly requires it.
Katelyn uses context slices to eliminate output degradation without model changes
Stefan Finch, Graph Digital's founder and the architect of Katelyn's context engineering layer, observed this failure pattern directly during Katelyn's early prototype development. Katelyn, Graph Digital's production multi-agent AI platform, implements context slices as the core information architecture pattern.
Katelyn operates as a system of specialist workers, each handling a specific task type within a larger workflow. Every worker receives a bounded context pack scoped to its exact task: the files, state, knowledge, and instruction content that task requires, assembled at runtime. Workers do not share a single global context. Each operates in its own bounded information environment.
In early prototypes, workers received full-system context: the complete knowledge base, all session state, and the full instruction set for the platform. Output quality degraded consistently. Workers weighted irrelevant content, missed task-specific signals buried in the larger context, and produced responses that were technically coherent but operationally wrong. Model capability and prompt quality were identical between the degraded and corrected versions. The variable was context scope.
Reducing each worker's context pack to its task-specific slice restored output quality without model changes. The production Katelyn system now assembles per-worker context packs as a deliberate architectural decision, not as an optimisation, but as the primary design principle. Context engineering is built into the platform from the first worker outwards.
This is the pattern Anthropic's multi-agent architecture research identifies as producing the highest-quality outputs for long-horizon tasks: isolated context per sub-agent, assembled per task, rather than a shared context that grows across the workflow.
If you are still mapping your agent architecture, the context architecture review in our AI Build Readiness Assessment starts with exactly this question: what information is each agent task actually receiving, and is that the right information?
Context engineering and memory: where one design layer ends and the other begins
Context engineering and memory are often conflated. They are different design problems at different layers.
Context engineering governs what an agent receives per run: the information assembled for a specific task, used during that task, and not automatically carried forward. Memory governs what persists between runs: the information retained from previous sessions and made available to future tasks.
The boundary matters. An agent with well-designed context slices but no memory architecture will reason correctly on each task but will not accumulate learning across tasks. An agent with strong memory architecture but undesigned context will accumulate information but reason over it poorly. Both layers require explicit design.
Context engineering is the subject of this article. The memory layer is a separate design problem, covered at how AI agents handle memory between runs.
When to involve a specialist
Context engineering failures have a specific diagnostic signature: the agent produces outputs that look right but are wrong relative to the actual situation. The organisation cannot easily tell whether the model is failing or the data is failing. Both look the same from the outside.
If that description fits a current production system - whether a mid-market B2B workflow tool, an internal automation, or a customer-facing agent - the problem is almost certainly not the model. It is the context layer, which was never explicitly addressed.
The commercially rational step is not to continue iterating on prompts or evaluating alternative models. It is to audit what context each agent task is actually receiving, identify whether the right information is going in at the right scope, and design context slices that bound each task correctly.
Getting this right at the build stage costs days. Diagnosing it in production, once the system is embedded in workflows and decision-making, costs weeks, and the cost of decisions made on wrong outputs is harder to quantify.
Graph Digital's AI Build Readiness Assessment includes context architecture review as a standard component: auditing what information each agent task receives, whether the context design matches the task requirements, and where context overload or underload is producing the failures that look like model problems.
Key takeaways
- Context engineering is the practice of deciding what information an AI agent receives per task. It operates upstream of the prompt and the model.
- Most production AI agent failures are context problems: the agent reasoned correctly over wrong or poorly scoped information.
- Context overload (too much information, attention degrades) and context underload (too little information, incomplete picture) both produce confident wrong outputs that are not detectable as system errors.
- No amount of prompt engineering or model switching resolves a context design failure. The problem is upstream of both.
- Context slices are bounded, task-scoped information packs assembled at runtime: one envelope per task, containing exactly what that task requires.
- Context engineering governs what goes in per run. Memory governs what persists between runs. Both require explicit design. They are not the same problem.
What is context engineering?
Context engineering is the design practice of deciding exactly what information an AI agent receives per task: what enters the context window, in what form, and scoped to what boundary. It is distinct from prompt engineering, which governs how the agent is instructed, and from memory architecture, which governs what persists between runs. Context engineering determines agent output quality before the model runs.
What is the difference between prompt engineering and context engineering?
Prompt engineering defines how an agent behaves: the instructions, constraints, and reasoning guidance that govern its response. Context engineering defines what information the agent has available when it executes those instructions. Both affect output quality. Only context engineering can fix a context problem. Prompt refinement cannot compensate for missing, overloaded, or incorrectly scoped information in the context window.
What are context slices?
A context slice is a bounded, task-scoped information pack assembled at runtime for a specific agent task. Rather than passing a general-purpose context that serves all tasks, a context slice contains exactly the data, state, and retrieved content that one task requires and excludes everything else. The design question for each slice is: what does this task require to reason correctly, and what would degrade reasoning if included?
