Why your AI agents need memory — and what we learned designing multi-agent memory architecture in production
Memory in a multi-agent AI system requires four typed stores with distinct lifetimes and write permissions. A shared blob collapses all four into one store, producing stale state, conflicting writes, and silent doctrine drift. The result is production behaviour that cannot be debugged.
By Stefan Finch, Graph Digital | Last reviewed: April 2026
Why memory in a multi-agent system is architecture, not a detail
Gartner forecasts that by 2028, AI agents will intermediate more than $15 trillion in B2B purchasing, handling 90% of all B2B purchases within three years from late 2025 (Digital Commerce 360, November 2025). The implication for technical leaders is not that agents are worth considering. It is that production reliability in multi-agent systems is commercially significant now.
AI agents were designed to execute tasks, not to remember across runs. A single agent with a well-structured prompt and a clear task scope performs reliably. The architecture question changes when agents multiply. In a multi-agent system, each agent reads from a shared context, each makes decisions that affect subsequent agents, and each run accumulates state that the next run inherits.
The system that works in a two-agent demo starts to fail in a five-agent production deployment. Agents contradict each other. State from previous runs bleeds into new ones. When something goes wrong, there is no traceable record of what any agent knew or decided. The system is not broken. It cannot be debugged. A system that cannot be debugged cannot be trusted at scale.
The four layers of memory in a production multi-agent system
AI agent memory architecture requires four typed stores with distinct lifetimes and write permissions. These are not abstract categories. They map to concrete design decisions about what agents read, what agents write, and when data expires.
Working memory is ephemeral. It exists for the duration of a single run and expires when the run ends. A single agent's working context, task inputs, retrieved facts, intermediate reasoning steps, lives here. Nothing in working memory should be assumed to persist.
Structured state is durable machine-readable truth: workflow stage, asset status, approval flags, scores, decision outcomes. Structured state belongs in typed JSON, not prose. It is read by multiple agents and writable only by agents with authority to update it.
Shared knowledge is system doctrine: skills, schemas, rubrics, policies. Versioned, read-only for workers, and writable only through governed flows. The permission asymmetry here is not a convenience. It is the mechanism that prevents silent doctrine drift.
Event history is the append-only record of what happened: which agent ran, what inputs it used, what decisions were taken, what failed and why. Event history is what makes production debugging tractable. Without it, there is no governed record of what any agent knew when it acted.
Typed Memory Stores is the correct application of established architectural principles: separation of concerns, lifecycle management, write permissions, applied to the specific failure modes of multi-agent AI systems.
Why the shared blob fails in production — five failure modes
A shared blob collapses all four layers into one undifferentiated store. The consequences are predictable.
Context bloat. Every agent loads the full blob, because there is no mechanism to bound what each agent reads. As the system grows, every agent reads more than it needs. Token consumption increases. Agents make decisions on context that is not relevant to their task.
Stale facts. State from previous runs persists in the blob without expiry. A workflow stage that was "in progress" three runs ago appears as current fact. Agents act on outdated information, not because the data was wrong when written, but because the system has no lifecycle mechanism to retire it.
Conflicting updates. Two agents write to the same location with different assumptions. Neither knows the other has written. The blob holds both values, or the later write overwrites the earlier one, with no record of the conflict.
Silent doctrine drift. Shared rules, the policies, schemas, and rubrics that govern agent behaviour, are stored in the same blob as working context. Any agent can overwrite them. When doctrine drifts, agent behaviour changes silently, without versioning, without audit.
Hard-to-debug behaviour. There is no governed record of what each agent knew when it acted. Reproducing a failure requires reconstructing the context from scattered logs or running the pipeline again and hoping the failure recurs.
These failure modes do not degrade linearly. Each additional agent and each accumulated run multiplies the probability that two or more failure modes interact. A two-agent system with a shared blob may appear stable. A ten-agent system with the same blob will not.
Shared blob vs. typed memory stores — architecture comparison
| Memory dimension | Shared blob | Typed memory stores | Consequence of getting this wrong |
|---|---|---|---|
| Working context | Mixed with persistent state — no expiry | Isolated per-run working memory — expires at run end | Stale context bleeds into every subsequent run |
| Structured state | Overwritable by any agent — no authority check | Typed JSON, writable only by authorised agents | Conflicting updates corrupt workflow state silently |
| System doctrine | Shared with working context — agent can overwrite | Versioned, read-only for workers | Rules drift silently — behaviour changes without audit |
| Event history | Absent or reconstructed from logs | Append-only — no agent can delete | Failures cannot be reproduced or debugged at scale |
The permissions model: who reads what, who writes where
The five failure modes above share a common cause: no permission layer. The blob model treats all agents as having equal read and write access to all memory. This is not a simplification that works early and fails later. It makes the failures above structurally inevitable.
The typed-store model enforces permissions by layer.
Workers read from structured state and shared knowledge. They write to working memory and to run artifacts. They do not write to shared knowledge. They do not update structured state directly.
Orchestrators update structured state based on worker outputs. They read event history. They do not modify doctrine.
Governance flows, not individual worker agents, update shared knowledge. Doctrine changes are versioned. An audit trail exists.
Event history is append-only. Nothing deletes it.
This permission model does not require a purpose-built memory management platform. In the Katelyn Skills OS, it is enforced through directory structure, file permissions, and worker contracts that specify what each agent is permitted to read and write.
What we learned building the Katelyn Skills OS — before and after
The Katelyn Skills OS is Graph Digital's live production multi-agent system. Before Stefan Finch redesigned the memory architecture in March 2026, the system operated three independent memory stores that could not communicate with each other.
Before. Claude auto-memory (session-scoped, prose blobs, bound to a specific machine), Katelyn workspace memory (stale, untyped markdown files updated inconsistently), and pipeline state (scattered JSON across individual run folders). Agents contradicted each other because they read from different stores. Sessions lost context because auto-memory was not portable. Decisions were not traceable because there was no event history. Across 300+ production pipeline runs between January and April 2026, cross-session coherence failures required manual state correction in approximately 1 in 8 sessions.
After. Four typed stores replaced the three independent systems. doctrine/ holds shared knowledge, versioned and read-only for workers. project/ holds durable strategic facts, updated only when decisions change. state/ holds current operational truth, governed by Katelyn and not written directly by workers. Run artifacts hold ephemeral working memory, discarded at run end unless promoted. Post-redesign, across 150+ subsequent runs, zero cross-session coherence failures required manual intervention. Debugging session time for state-related issues fell from multi-session investigation to single-session resolution, using the event history trail.
The redesign did not require a platform change. The four typed stores map to directory structures and typed JSON files. The permission model is enforced through worker contracts.
[First-party — Graph Digital internal system, 2025–2026]
The three things that must not be conflated: facts, judgments, recommendations
Most systems that degrade under the shared blob model do so because they store three distinct categories as if they are the same.
Facts are durable, machine-verifiable truths: a workflow stage, an asset approval status, a client decision recorded at a specific date. They belong in structured state, with a lifecycle tied to the state they represent.
Judgments are agent reasoning outputs: a quality score, a strategic assessment, a diagnosis. Situational and time-bounded, they should carry context: which agent produced them, with what inputs, at what pipeline stage. A judgment stored as a fact will be acted on as if it were objectively true.
Recommendations are proposed next actions, ephemeral until accepted. A recommendation that is not acted on must not persist as if it were. Storing unaccepted recommendations as facts produces stale state that the next run inherits.
The diagnostic: if the current memory model cannot distinguish a fact from a judgment from a recommendation in schema, in lifecycle, in write permissions, the system will degrade.
How to apply the four-layer model — the handoff artifact pattern
The four-layer model is applied through the handoff artifact pattern: each agent receives a bounded context pack at run start and produces a structured output artifact at run end.
The bounded context pack contains only what the agent needs: relevant structured state, applicable doctrine from shared knowledge, and any event history relevant to its task. It does not contain the full blob.
The structured output artifact is typed JSON, not prose. It specifies what changed in structured state (if the agent has write authority), what the agent decided, and what events to append to the history log. Working memory is discarded unless promoted.
This pattern enforces the permission model in practice. A worker receiving a bounded context pack cannot accidentally overwrite doctrine it was never given. An orchestrator reading typed output artifacts has a machine-readable record of what each agent decided.
The pattern is compatible with any multi-agent framework. It does not require a specific platform, only that agents receive typed inputs and produce typed outputs, and that the system distinguishes the lifetime of each artifact.
What the four-layer model means for production-grade AI agents
- AI agent memory architecture is a first-class design decision, not a detail to address after core logic is working. Systems built on a shared blob accumulate failure modes before they can be scaled.
- Typed Memory Stores partitions memory into four layers: working memory, structured state, shared knowledge, and event history, each with distinct lifetimes and write permissions. This is the responsible baseline for any production multi-agent system where agents must coordinate across runs.
- A shared blob produces five compounding failure modes: context bloat, stale facts, conflicting updates, silent doctrine drift, and hard-to-debug behaviour. These compound combinatorially as agent count and run count increase, not linearly.
- The permission model, where workers read but do not write doctrine, orchestrators update state, and event history is append-only, is what prevents the five failure modes. A blob has no permission layer by design.
- The Katelyn Skills OS (Graph Digital, 2025–2026) moved from three independent memory systems to four typed stores in March 2026. Cross-session coherence failures dropped from approximately 1 in 8 sessions across 300+ runs to zero across 150+ subsequent runs.
- Facts, judgments, and recommendations require different schemas, different lifetimes, and different write permissions. A system that conflates them will accumulate stale state and produce undebuggable behaviour as it scales.
- For simpler architectures — single-agent systems, short-lived task pipelines, or early-stage prototypes — a full four-store implementation may be unnecessary overhead. The typed-store model is designed for production multi-agent systems where agents share state across runs.
Frequently asked questions
What are Typed Memory Stores in a multi-agent AI system?
Typed Memory Stores is a governed memory architecture for multi-agent AI systems that partitions agent memory into four bounded layers: working memory (ephemeral, per-run), structured state (durable machine-readable truth), shared knowledge (versioned doctrine, read-only for workers), and event history (append-only decision record). Each layer has a defined lifetime and write permission set. The architecture prevents the compounding failure modes, context bloat, stale facts, conflicting updates, silent doctrine drift, and hard-to-debug behaviour, that emerge when all four layers are collapsed into a single shared blob.
Why does a shared blob fail in production multi-agent systems?
A shared blob collapses four distinct memory functions into one undifferentiated store with no permission layer and no lifecycle enforcement. Because every agent reads and writes to the same location, conflicting updates overwrite each other, stale facts persist without expiry, and shared rules drift silently. These failures do not degrade linearly: each additional agent and each accumulated run multiplies the probability that two or more failure modes interact simultaneously. A system that appears stable with two agents will not remain stable with ten.
What are the four layers of memory in a production AI agent system?
The four layers are: (1) Working memory, ephemeral, exists for a single run, expires when the run ends; (2) Structured state, durable, machine-readable, typed JSON, holds current operational truth such as workflow stage and approval status; (3) Shared knowledge, system doctrine including schemas, policies, and rubrics, versioned and read-only for most agents; (4) Event history, append-only record of what each agent did, what inputs it used, and what decisions it reached. Each layer has a distinct write permission: workers write to working memory and run artifacts; orchestrators update structured state; governance flows update shared knowledge; no agent deletes event history.
How did Graph Digital implement the four-layer model in the Katelyn Skills OS?
Stefan Finch redesigned the Katelyn Skills OS memory architecture in March 2026. Before redesign, three independent memory systems operated without communication, producing approximately 1 in 8 cross-session coherence failures across 300+ production pipeline runs. After redesign, four typed stores, doctrine/, project/, state/, and run artifacts, replaced the three independent systems. Across 150+ subsequent runs, zero cross-session coherence failures required manual intervention. The implementation used directory structures, typed JSON files, and worker contracts. No platform change was required.
What is the permissions model for AI agent memory?
The permissions model assigns read and write authority by memory layer. Workers read from structured state and shared knowledge; they write only to working memory and run artifacts. Orchestrators update structured state based on worker outputs. Governance flows, not individual worker agents, update shared knowledge and versioned doctrine. Event history is append-only: no agent or orchestrator can delete it. This asymmetry prevents silent doctrine drift (workers cannot overwrite rules they read) and enables production debugging (event history provides a traceable record of every agent decision).
How do you separate facts, judgments, and recommendations in agent memory?
Facts are durable, machine-verifiable truths that belong in structured state with a lifecycle tied to the state they represent. Judgments are agent reasoning outputs, situational, time-bounded, carrying metadata about which agent produced them and with what inputs. Recommendations are proposed next actions, ephemeral until accepted, and must not persist as facts. The diagnostic question: does the current memory schema distinguish these three types with different schemas, different lifetimes, and different write permissions? If not, the system will accumulate stale state as agent count and run count increase.
Ready to build a production-grade agent memory architecture?
A multi-agent AI system without a governed memory model will not scale reliably to production. Graph Digital's AI Readiness Assessment includes a memory architecture review, four typed stores, permissions model, handoff artifact pattern, so the system is debuggable and coherent before scale.
