SaaS tools were designed for teams sharing dashboards, not for a single operator who needs one decision made. Because the platform cost was paying for features a scoped AI skill could replace entirely, Graph Digital built the finance reconciliation agent in under a day. It has run continuously for six months. Zero failures. Five-figure annual cost eliminated.
By Stefan Finch, Graph Digital | Last reviewed: April 2026
I am Stefan Finch, Founder of Graph Digital. Graph Digital builds purpose-built AI agents and skills for UK mid-market operations teams: this is an account of one we built and operated ourselves. This is not a theoretical argument about whether AI can replace SaaS tools. It is an account of one that did. The specific decision, the specific build, the specific architecture, and what six months of continuous production operation looks like.
Most writing about AI replacing SaaS is written by people who built a demonstration. This is about what happens after the demonstration.
For context on how AI agents work in general, see What is an AI Agent? The business leader's guide to the new execution layer.
The process: what the finance reconciliation skill does on every run
Finance reconciliation is exactly the kind of process a purpose-built AI skill is designed for.
Three source systems that do not communicate with each other (ERP, bank feed, expense management) previously required someone to export data, cross-reference it manually, identify discrepancies, and build a report. The manual version was time-consuming, error-prone, and produced a result that was already out of date before the person finished.
The Graph Digital finance reconciliation agent performs four operations on each run:
- Reads three source systems (ERP, bank feed, and expense management) via purpose-built CLI tool interfaces that return clean, structured data
- Cross-references the data against defined rules: matching transactions, identifying discrepancies, categorising exceptions by type and value
- Applies confidence scoring to each exception: items above the threshold are auto-categorised, items below are flagged for human review
- Produces a structured exception report and escalates flagged items via Slack
No human is in the execution loop. The agent runs, produces its output, and delivers it. Humans review only the exceptions that fall below the confidence threshold: the genuine edge cases that require judgment.
Run time: minutes. Manual equivalent: hours. The reduction is structural, not incremental — the skill eliminates the manual export-and-reconcile cycle entirely, not just speeds it up.
The build: what was used and how long it took
The reconciliation skill was built using Claude Code Skills, a framework for building purpose-built AI capabilities as discrete, composable skills rather than monolithic agent systems.
The build took under a day of senior engineering time. Not a sprint. Not a sprint followed by a debugging phase followed by a stabilisation phase. Under a day.
The reason the build was fast is the same reason the system holds up in production: scope clarity before build began. Before writing a line of code, the process was defined: inputs, rules, outputs, and the exceptions that would require human review. A scoped process with clear inputs and outputs is straightforwardly buildable. A vague process with unclear exception handling is not, regardless of how capable the underlying AI is.
What was not used is as important as what was. The skill does not use MCP (Model Context Protocol) connections. It does not use a visual workflow builder. It does not depend on a third-party integration platform. Every tool interface is a purpose-built CLI wrapper that does exactly one thing and returns clean, structured data.
That choice — CLI over MCP — is the single most consequential architecture decision in the production performance of this skill.
The SaaS cost case: what was replaced and what it cost
The replaced stack:
- PowerBI reporting licence: monthly SaaS cost for dashboards and reporting that were rebuilt in full by the reconciliation agent's exception report output
- Finance exception management SaaS: the tool managing the exception workflow and flagging process that the agent now handles directly
- Data bridging and export tooling: the connectors and scheduled jobs that moved data between systems before the agent could read them directly
Combined annual cost: five figures. Replacement cost: under one day of senior engineering time at Graph Digital's own internal rate.
The saving is not primarily the licence cost, though that is material. The saving is the permanent elimination of a recurring obligation to vendors whose pricing, terms, and product roadmaps Graph Digital did not control. Every time one of those vendors raised prices, changed licensing terms, or deprecated a feature, Graph Digital needed to respond. That obligation is gone.
A SaaS tool is rented capability. A purpose-built AI skill is owned capability. This is what Gartner and Deloitte forecast as point-product SaaS displacement: the replacement of general-purpose tools by scoped, purpose-built AI agents at the specific-process level. The distinction matters beyond cost.
Research from Zylo (2025) analysing $40 billion across 40 million licences found that 46 to 51 percent of SaaS licences are unused, costing enterprises $18 to $19.8 million annually in wasted spend. Gartner forecasts that by 2030, 35 percent of point-product SaaS tools will be replaced by AI agents or absorbed within larger agent ecosystems (via Deloitte TMT Predictions, 2026). The finance reconciliation agent is one such point-product replacement: not a prediction, but a production result.
Why CLI tool interfaces and not MCP: the production architecture decision
This is the part most writing about AI agents gets wrong.
The current market direction for AI agent tool connectivity is MCP (Model Context Protocol). MCP is a standardised interface that allows agents to connect to data sources and tools in a consistent way. The case for MCP is real: standardisation, interoperability, and a growing ecosystem of compatible connectors.
The counter-finding from operating this skill in continuous production: MCP connections degrade under real workload in ways that do not appear in demonstrations.
The protocol adds overhead to every tool call. That overhead accumulates in the context window. Over a complex multi-step run (and a reconciliation that crosses three source systems is exactly that), the context window fills with protocol scaffolding: metadata, intermediate results, connection state. Until the agent's reasoning quality degrades. Not the protocol performance itself. The reasoning.
The result in production: slower decisions, higher cost per run, and reduced confidence in categorisation that leads to more items being flagged for human review than necessary. A system that was supposed to reduce manual work introduces new manual work through false positives.
CLI tool wrappers do not accumulate protocol overhead. A CLI wrapper takes a specific command and returns clean structured data. No protocol scaffolding. No context overhead. The agent calls the tool, gets the data, and reasons with it.
CLI tool wrappers outperform MCP connections in production AI agents on every metric that matters: speed, stability, cost per run, and ease of debugging when something goes wrong.
| Metric | MCP connections | CLI tool wrappers |
|---|---|---|
| Speed | Slower — protocol overhead on every call | Faster — direct command, clean return |
| Stability | Degrades under sustained load | Consistent across six months of production |
| Cost per run | Higher — context window fills with protocol metadata | Lower — no overhead accumulation |
| Debuggability | Opaque — protocol state adds noise | Clear — call, return, result; nothing hidden |
This is not a theoretical preference. It is the result of six months of continuous production operation. It is also counter-intuitive enough that most build teams discover it only after deploying MCP-based connections and beginning to see the degradation. Most connect the degradation to the AI reasoning quality rather than the architecture choice. By that point, they have built against the wrong foundation.
Six months of production: what zero failures actually looks like
Industry data from 2025 puts the production failure rate for AI agent deployments at 95 percent, with state management and memory architecture consistently cited as the leading causes (Parallel Labs, citing Deloitte analysis, 2025).
The finance reconciliation agent has been running in production for six months. Zero failures.
That result is not accidental. It is the product of three design decisions made before the first line of code was written:
1. An explicit memory layer
The agent does not rely on the conversation window to carry context between runs. Every run loads what it needs from an explicit state store: current exceptions, prior decisions, agreed thresholds, processing history. The state store is updated at the end of every run. The next run starts with full knowledge of what came before it.
2. Lightweight, purpose-built tool interfaces
Not MCP. CLI wrappers that do exactly one thing, return clean data, and add no protocol overhead to the context. Every interface was built specifically for this agent's three data sources.
3. Logging and observability on every run
Every run records what decision was made, what context was available, what tools were called, and what the output was. When something unexpected occurs, the run log shows exactly what the agent saw and what it did. Nothing is opaque.
These are not sophisticated engineering choices. They are the minimum viable architecture for a production AI agent. The reason 95 percent of agent deployments fail is that most builds skip one or more of them (usually memory, often observability) because they are not exciting to build and do not make the demo more impressive.
The demo always works. Production reveals what the demo concealed.
What this means for your SaaS stack: the scoping question
The finance reconciliation agent is not a proof of concept. It is proof that a scoped AI skill can replace a five-figure SaaS stack, provided the scope is clear.
That condition is the constraint. Not AI capability. Scope clarity.
Before asking "can AI replace this tool?" ask three questions:
Can you name the specific process? Not "finance operations" but the reconciliation run: ERP cross-referenced against bank feed and expense management, rules applied, exceptions flagged. Named. Scoped. Defined at the level of inputs, outputs, and the exceptions that need human review.
Are the data sources accessible and consistent? The reconciliation agent reads three systems. Each returns clean, structured data when queried. A process that depends on inconsistent or unstandardised data is not agent-ready. Not because of AI, but because of the data.
What does success look like, specifically? For the reconciliation agent: every run completes, all discrepancies are identified, items below confidence threshold are flagged, no false positives go unreviewed. These are measurable. They are checked on every run.
If these three questions have clear answers, you have the brief for a purpose-built AI skill. If they do not, the right first step is not an agent build. It is the process definition and data readiness work that must come before.
SaaS tools are designed for the general case. They are built for teams of ten, for multiple users with different needs, for dashboards no single operator will ever use in full. The monthly licence is the cost of renting general-purpose capability to solve a specific-process problem.
A scoped AI skill is built for one process. One set of inputs. One defined output. One set of rules. One exception path.
That is why the skill costs less and performs better. Not because AI is superior to software. Because purpose-built beats general-purpose every time the scope is clear.
Key takeaways
- Graph Digital's finance reconciliation agent replaced PowerBI reporting and SaaS finance tooling at five-figure annual cost, built in under a day using Claude Code Skills, and has run in continuous production for six months with zero failures.
- CLI tool wrappers outperform MCP connections in production AI agents on speed, stability, cost per run, and debuggability. Protocol overhead from MCP accumulates in the context window and degrades reasoning quality over complex runs. This does not appear in demos.
- The three design decisions that determine production survival: explicit memory layer, lightweight purpose-built tool interfaces, and logging on every run. Most failed deployments skip at least one.
- SaaS tools are rented general-purpose capability. A scoped AI skill is owned purpose-built capability. The skill costs less and performs better not because AI is superior, but because the scope was clear before the build began.
- The scoping pre-condition: if you can name the specific process, confirm data source accessibility and consistency, and define measurable success, you have the brief for a skill that could displace the SaaS tool you are currently paying for.
Frequently asked questions
What did the finance reconciliation agent actually replace?
The Graph Digital finance reconciliation agent replaced three components: a PowerBI reporting licence, a finance exception management SaaS tool, and data bridging and export tooling that connected source systems. Combined annual cost: five figures. The agent was built using Claude Code Skills in under a day of senior engineering time.
Why use CLI tool interfaces instead of MCP?
MCP (Model Context Protocol) adds protocol overhead to every tool call. That overhead accumulates in the context window during multi-step runs, degrading reasoning quality over time. Not the protocol performance, but the agent's ability to reason clearly. CLI wrappers return clean structured data with no overhead. Over six months of continuous production, the CLI approach produced no failures; MCP was evaluated and rejected because the context degradation pattern was clear in testing.
What made this build possible in under a day?
Scope clarity before build began. The process was defined at the level of inputs (ERP, bank feed, expense management), rules (cross-referencing logic and thresholds), outputs (structured exception report), and exception handling (what gets flagged for human review, what passes through). A clearly scoped process with consistent data sources is buildable fast. A vague scope is not, regardless of AI capability.
Is this replicable for other SaaS tools?
Only for processes that meet the scoping conditions: a named, specific process with defined inputs and outputs, consistent and accessible data sources, and a measurable definition of success. Finance reconciliation meets all three. Many common SaaS use cases do too: approval workflows, exception reporting, data enrichment, compliance checks. Broad platform capabilities that serve multiple different users with different workflows do not.
What does production stability require?
Three things: an explicit memory layer that persists state between runs (not the conversation window), purpose-built tool interfaces that return clean structured data without protocol overhead, and logging on every run. These are the minimum. Most AI agent failures trace back to one or more of these being absent from the original architecture.
Have a SaaS cost you suspect a scoped AI skill could displace?
The right first step is not an agent build. It is a structured assessment of whether the conditions are in place: process clarity, data readiness, and a measurable definition of success. The AI Readiness Assessment is a 1-2 week diagnostic that maps your identified process against real production requirements. From £4,000. Find out how the AI Readiness Assessment works
