We replaced a five-figure SaaS tool with an AI skill in under a day. Six months of production, zero failures.

Graph Digital's finance reconciliation agent, built using Claude Code Skills in under a day, replaced PowerBI reporting and several SaaS tools at five-figure annual cost. The skill reads three source systems, cross-references transaction data against defined rules, identifies discrepancies, and produces a structured exception report. Six months of continuous production. Zero failures. This is the account of the build, the architecture decisions that determined production survival, and the three questions to ask before attempting the same for your own SaaS stack.

Stefan Finch

Founder, Head of AI

Apr 18, 2026

Discuss this article with AI

I am Stefan Finch, Founder of Graph Digital, and this is a first-person account of a production system I built and operate. This is not a theoretical argument about whether AI can replace SaaS tools. It is an account of one that did — the specific decision, the specific build, the specific architecture, and what six months of continuous production operation looks like.

Most writing about replacing SaaS with AI is written by people who built a demonstration. I built a production system. The finance reconciliation agent at Graph Digital has been running continuously for six months. Zero failures. Five-figure annual SaaS cost eliminated. Built in under a day. This is the account of how, why, and what the architecture looks like — so you can evaluate whether the same logic applies to your stack.

For context on how AI agents work in general, see What is an AI Agent? The business leader's guide to the new execution layer.

The process: what the finance reconciliation skill does on every run

Finance reconciliation is exactly the kind of process a purpose-built AI skill is designed for. Three source systems that do not communicate with each other. Defined rules for cross-referencing. A structured output that needs to be produced reliably, on schedule, without a person in the middle.

Before the agent, someone had to export data from the ERP, the bank feed, and the expense management system, cross-reference it manually, identify discrepancies, and build a report. The manual version was time-consuming, error-prone, and already out of date before the person finished.

The Graph Digital finance reconciliation agent performs four operations on each run:

Reads three source systems — ERP, bank feed, and expense management — via purpose-built CLI tool interfaces
Cross-references transaction data against defined rules: amounts, categories, timing, counterparties
Identifies discrepancies: items where systems disagree or where data falls outside expected parameters
Produces a structured exception report, flagging items for human review where confidence falls below threshold; passing routine items through without human involvement

Items the agent cannot categorise with confidence are flagged. Everything else runs without a human in the loop.

This is not a demonstration of broad AI capability. It is one process, running reliably, every time. That distinction — one job, defined scope, measurable output — is precisely what makes it work.

The architecture decisions behind that reliability are where the real story is.

The build: what was used and how long it took

The reconciliation skill was built using Claude Code Skills — a framework for building purpose-built AI capabilities as discrete, composable skills rather than monolithic agent systems.

The build took under a day of senior engineering time. Not a sprint. Not a sprint followed by a debugging phase followed by a stabilisation phase. Under a day.

The reason the build was fast is the same reason the system holds up in production: scope clarity before build began. Before writing a line of code, the process was defined — inputs, rules, outputs, and the exceptions that would require human review. A scoped process with clear inputs and outputs is straightforwardly buildable. A vague process with unclear exception handling is not, regardless of how capable the underlying AI is.

What was not used is as important as what was. The skill does not use MCP (Model Context Protocol) connections. It does not use a visual workflow builder. It does not depend on a third-party integration platform. Every tool interface is a purpose-built CLI wrapper that does exactly one thing and returns clean, structured data.

That choice — CLI over MCP — is the single most consequential architecture decision in the production performance of this skill. The build time was a consequence of scope clarity. The production stability is a consequence of the architecture.

Both are explained by the same principle: know precisely what you are building before you start.

The cost case makes clear what was replaced — and why "owned capability" is the better frame than "cost saving."

The SaaS cost case: what was replaced and what it cost

The replaced stack had three components:

PowerBI reporting licence: monthly SaaS cost for dashboards and reporting, rebuilt in full by the reconciliation agent's exception report output
Finance exception management SaaS: the tool managing the exception workflow and flagging process that the agent now handles directly
Data bridging and export tooling: the connectors and scheduled jobs that moved data between systems before the agent could read them directly

Combined annual cost: five figures. Replacement cost: under one day of senior engineering time at Graph Digital's own internal rate.

The saving is not primarily the licence cost, though that is material. The saving is the permanent elimination of a recurring obligation to vendors whose pricing, terms, and product roadmaps Graph Digital did not control. Every time one of those vendors raised prices, changed licensing terms, or deprecated a feature, Graph Digital needed to respond. That obligation is gone.

A SaaS tool is rented capability. A purpose-built AI skill is owned capability. The distinction matters beyond cost — it matters for operational autonomy, for incident response, and for the predictability of a process that runs every day.

SaaS Displacement: the permanent elimination of a recurring vendor licence obligation by replacing it with a purpose-built, owned AI skill at one-time build cost — zero licence renewal, zero vendor dependency, zero feature-gated access to your own workflow.

"A SaaS tool is rented capability. A purpose-built AI skill is owned capability."

Stefan Finch, Founder, Graph Digital

The market is catching up to this logic. Deloitte TMT Predictions 2026 forecasts that by 2030, 35 percent of point-product SaaS tools will be replaced by AI agents or absorbed within larger agent ecosystems. The finance reconciliation agent is one such point-product replacement. Not a forecast — a production result.

Understanding why it holds up in production requires understanding the architecture decision that most AI builds get wrong.

Why CLI tool interfaces and not MCP: the production architecture decision

This is the part most writing about AI agents gets wrong.

The current market direction for AI agent tool connectivity is MCP (Model Context Protocol) — a standardised interface that allows agents to connect to data sources and tools in a consistent way. The case for MCP is real: standardisation, interoperability, and a growing ecosystem of compatible connectors. The case is also built entirely on demonstrations.

The counter-finding from operating the reconciliation skill in continuous production: MCP connections degrade under real workload in ways that do not appear in demonstrations.

The protocol adds overhead to every tool call. That overhead accumulates in the context window. Over a complex multi-step run — and a reconciliation that crosses three source systems is exactly that — the context window fills with protocol scaffolding: metadata, intermediate results, connection state. The agent's reasoning quality degrades. Not the protocol performance itself. The reasoning.

The production result: slower decisions, higher cost per run, and a measurable increase in false positives — items flagged for human review that did not require it. In the reconciliation context, this is the critical failure mode: the system was supposed to reduce manual exception processing, and MCP-based connections produced the opposite effect. The run time increased. The cost per run increased. The exception queue grew. A system built to reduce manual work was introducing new manual work.

CLI tool wrappers do not do this. A CLI wrapper takes a specific command and returns clean structured data. No protocol scaffolding. No context overhead. The agent calls the tool, gets the data, and reasons with it.

CLI tool wrappers outperform MCP connections in production AI agents on every metric that matters: speed, stability, cost per run, and ease of debugging when something goes wrong.

This is counter-intuitive enough that most build teams discover it only after deploying MCP-based connections and beginning to see the degradation. Most connect the degradation to AI reasoning quality rather than the architecture choice. By that point, they have built against the wrong foundation.

This architecture decision is one of three that determined whether the agent would survive in production. The others are about memory and observability.

Six months of production: what zero failures actually looks like

According to MIT research published in August 2025, 95 percent of enterprise generative AI pilots fail to deliver measurable P&L impact — based on analysis of 52 executive interviews, 153 leader surveys, and 300 public deployments. The leading causes are consistent: state management failures and memory architecture gaps.

The finance reconciliation agent has been running in production for six months. Zero failures.

That result is not accidental. It is the product of three design decisions made before the first line of code was written.

An explicit memory layer. The agent does not rely on the conversation window to carry context between runs. Every run loads what it needs from an explicit state store: current exceptions, prior decisions, agreed thresholds, processing history. The state store is updated at the end of every run. The next run starts with full knowledge of what came before it.

Lightweight, purpose-built tool interfaces. Not MCP. CLI wrappers that do exactly one thing, return clean data, and add no protocol overhead to the context window. Every interface was built specifically for the agent's three data sources.

Logging and observability on every run. Every run records what decision was made, what context was available, what tools were called, and what the output was. When something unexpected occurs, the run log shows exactly what the agent saw and what it did. Nothing is opaque.

These are not sophisticated engineering choices. They are the minimum viable architecture for a production AI agent. The reason 95 percent of agent deployments fail is that most builds skip one or more of them — usually memory, often observability — because they are not exciting to build and do not make the demo more impressive.

"The demo always works. Production reveals what the demo concealed."

Stefan Finch, Founder, Graph Digital

What this means for your SaaS stack: the scoping question

The finance reconciliation agent is not a proof of concept. It is proof that a scoped AI skill can replace a five-figure SaaS stack — provided the scope is clear before the build begins.

That condition is the constraint. Not AI capability. Scope clarity.

Before asking "can AI replace this tool?" ask three questions:

Can you name the specific process? Not "finance operations" — the reconciliation run: ERP cross-referenced against bank feed and expense management, defined rules applied, discrepancies identified, exceptions flagged. Named. Scoped. Defined at the level of inputs, outputs, and the exceptions that need human review. If you cannot write this in two sentences, the process is not yet scoped enough to build.

Are the data sources accessible and consistent? The reconciliation agent reads three systems. Each returns clean, structured data when queried. A process that depends on inconsistent or unstandardised data is not agent-ready — not because of AI, but because of the data. This is readiness work, and it must come before build.

What does success look like, specifically? For the reconciliation agent: every run completes, all discrepancies are identified, items below confidence threshold are flagged, no false positives go unreviewed. These are measurable. They are checked on every run. If you cannot define what a successful run looks like in terms a machine can verify, the scoping work is incomplete.

If these three questions have clear answers, you have the brief for a purpose-built AI skill that could displace the SaaS tool you are currently paying for.

SaaS tools are designed for the general case — built for teams of ten, for multiple users with different needs, for dashboards no single operator will ever use in full. The monthly licence is the cost of renting general-purpose capability to solve a specific-process problem.

A scoped AI skill is built for one process. One set of inputs. One defined output. One set of rules.

That is why the skill costs less and performs better. Not because AI is superior to software. Because purpose-built beats general-purpose every time the scope is clear. If you can answer those three questions about a SaaS tool in your stack, you already have the brief for what replaces it.

Before you build:

Name the specific process at the level of inputs, outputs, and exception paths — if you cannot write it in two sentences, the scoping work is incomplete
Verify data source accessibility and consistency before any build begins — data readiness is a prerequisite, not a parallel workstream
Define measurable success criteria that a machine can verify on every run — the three questions that determine whether your SaaS stack has a viable AI replacement

The readiness work — process definition, data accessibility, success criteria — is where the real risk lives, not the build. Organisations that skip it spend serious money on a system the business cannot trust. Graph Digital's AI Readiness Assessment is a 1–2 week diagnostic that maps your identified process against real production requirements — before a line of code is written.

Find out how the AI Readiness Assessment works

Frequently asked questions

What did the finance reconciliation agent actually replace?

The Graph Digital finance reconciliation agent replaced three components: a PowerBI reporting licence, a finance exception management SaaS tool, and data bridging and export tooling that connected source systems. Combined annual cost: five figures. The agent was built using Claude Code Skills in under a day of senior engineering time.

Why use CLI tool interfaces instead of MCP?

MCP (Model Context Protocol) adds protocol overhead to every tool call. That overhead accumulates in the context window during multi-step runs, degrading reasoning quality over time — not the protocol performance, but the agent's ability to reason clearly. CLI wrappers return clean structured data with no overhead. Over six months of continuous production, the CLI approach produced no failures; MCP was evaluated and rejected because the context degradation pattern was clear in testing.

What made this build possible in under a day?

Scope clarity before build began. The process was defined at the level of inputs (ERP, bank feed, expense management), rules (cross-referencing logic and thresholds), outputs (structured exception report), and exception handling (what gets flagged for human review, what passes through). A clearly scoped process with consistent data sources is buildable fast. A vague scope is not, regardless of AI capability.

What happens if the reconciliation agent fails or encounters unexpected data?

The agent is designed for graceful degradation. Items it cannot categorise with confidence are flagged for human review rather than passed through. The memory layer ensures each run starts with complete context from prior runs, so one anomalous run does not corrupt subsequent ones. The logging layer records every decision and tool call, so any unexpected behaviour is diagnosable. The agent fails safely — it flags rather than guesses.

Is this replicable for other SaaS tools?

Only for processes that meet the scoping conditions: a named, specific process with defined inputs and outputs, consistent and accessible data sources, and a measurable definition of success. Finance reconciliation meets all three. Many common SaaS use cases do too: approval workflows, exception reporting, data enrichment, compliance checks. Broad platform capabilities that serve multiple different users with different workflows do not.

What does production stability require?

Three things: an explicit memory layer that persists state between runs, purpose-built tool interfaces that return clean structured data without protocol overhead, and logging on every run. These are the minimum. Most AI agent failures trace back to one or more of these being absent from the original architecture.

If you could answer the three scoping questions above — the specific process, the accessible data sources, the measurable success criteria — the AI Readiness Assessment is your next step. It is a 1–2 week diagnostic that maps your identified process against real production requirements: process clarity, data readiness, and a measurable definition of success — so your investment targets the right process before a line of code is written. From £4,000. Find out how the AI Readiness Assessment works

Stefan Finch — Founder, Graph Digital

Stefan Finch is the founder of Graph Digital, advising leaders on AI strategy, commercial systems, and agentic execution. He works with digital and commercial leaders in complex B2B organisations on AI visibility, buyer journeys, growth systems, and AI-enabled execution.

Connect with Stefan: LinkedIn

Graph Digital is an AI-powered B2B marketing and growth consultancy that specialises in AI visibility and answer engine optimisation (AEO) for complex B2B companies. AI strategy and advisory →