June 23, 2026•Strategy•Rob Murtha

Enterprise AI Readiness: The Four Seams Where Value Leaks

Enterprise AI readiness fails at four seams — technique, architecture, pricing, and provenance. A field-tested look at where automation value actually leaks.

Last week I put a question to my network and asked people to vote on it: of all the things that can go wrong with enterprise AI, what concerns you most right now? I offered three answers — technique readiness, tooling and architecture, pricing and value delivery — and left a fourth slot open for whatever I'd missed.

The vote split almost evenly across the three. That tie is the actual finding. When a question this consequential produces a three-way deadlock, it usually means the respondents are each staring at a different part of the same animal. So this is the longer answer I owe everyone who voted, shaped by a run of candid conversations with operators over the past month, and grounded where I can ground it.

Here's the thesis I keep returning to: capital doesn't leak in the lab. It leaks at the seams — the joints between where intelligence is produced and where it's actually consumed. There are four of them, and the open fourth slot on that poll turns out to be the most important one.

The MIT number that should reframe the whole conversation

Start with the data point that hangs over all of this. In August 2025, MIT's NANDA initiative published The GenAI Divide: State of AI in Business 2025 and reported that 95% of enterprise generative-AI pilots were delivering no measurable return — this against $30–40 billion in enterprise spend. McKinsey's State of AI 2025 tells a compatible story from the other side: adoption is near-universal, but only a single-digit fraction of organizations have scaled AI to material enterprise-wide impact. Two-thirds are stuck in what McKinsey bluntly calls "pilot purgatory."

The reflex read is "the models aren't good enough." That's wrong, and MIT's own diagnosis says so: the failures trace to a learning and integration gap, not model quality. The models cleared the bar. The organization didn't.

Which means the failure is structural. It lives in the seams. Let me take them one at a time.

Seam one: technique readiness is a capacity-allocation problem, not a tools problem

The first thing I asked was whether companies and their people are using their capacity well, and offloading the right things to automation. The honest field answer is that most are offloading whatever is most visible, not whatever is highest-leverage.

Technique readiness is the organizational skill of deciding which cognitive work to keep and which to delegate — and it is mostly missing. Teams reach for the demo-friendly use cases: summarize the meeting, draft the email, clean up the deck. MIT found exactly this misallocation — heavy spend on front-office sales and marketing theater, where ROI is thin, while the back-office processes where automation actually compounds go untouched.

The research on what workers themselves want to offload is clarifying here. Stanford's 2025 WORKBank study, Future of Work with AI Agents, surveyed the U.S. workforce and found the dominant motivation for automating a task — cited in 69% of cases — was "freeing up time for high-value work," followed by repetitiveness and stress. Workers are not asking to be replaced. They are asking to have the low-value substrate of their job removed so their judgment has room to operate.

The teams getting this right treat it as a portfolio decision:

Keep (augment):   judgment, taste, relationship, novel synthesis, accountability
Offload (automate): retrieval, transformation, formatting, first-draft, reconciliation
Contested (audit):  anything where the cost of a wrong answer exceeds the cost of doing it slowly

That contested middle row is where most of the value and most of the danger sit. Get it wrong in the conservative direction and you've bought expensive autocomplete. Get it wrong in the aggressive direction and you've automated a decision that needed a human's name attached to it. This is the same argument I made in The Task Compression Advantage: the win isn't doing the old work faster, it's recomposing the work so humans only touch the parts that genuinely require them.

The scale of the decision space is easy to underestimate. Browse Gerolamo's intelligence hub and the agentic workflow orchestration domain alone tracks nearly 900 distinct projects — every one of them a different opinion about which work should be handed to a machine. The strategic skill isn't picking a framework off that list. It's knowing which of your workflows belong in the offload column before you go shopping. The tooling is abundant; the judgment about where to apply it is the scarce input.

The companies winning here aren't the ones with the most licenses. McKinsey's data shows future-built organizations plan for 50%+ of staff to upskill and are six times more likely to protect structured learning time. Technique readiness is a muscle, and it's trained deliberately or not at all.

Seam two: legacy architecture is where AI value goes to die

The second question — whether companies are dragging legacy dependencies and environments that conflict with AI infrastructure best practices — got a quieter, more resigned set of answers in my conversations. Everyone knows the answer is yes. The interesting part is why they keep doing it.

Legacy architecture conflicts with AI not because the old systems are slow, but because they were designed around a different assumption: that data moves in batches, on schedules, between silos. Agentic systems assume the opposite — real-time, composable, queryable, with clean provenance on every record. When you bolt the new assumption onto the old plumbing, you don't get AI. You get a very expensive adapter layer.

McKinsey's analysis of CIO budgets for the AI era names the trap precisely: most organizations are adding AI capability on top of existing systems rather than replacing anything. New deployments pile operating burden onto an undiminished legacy footprint, technical debt rises, and "any gains from change are offset by high run expenditures." Their projection is stark — on the current path, ROI on technology spend flattens, because every dollar of new capability is taxed by the cost of keeping the old world alive underneath it.

The "why" is rarely technical. It's almost always one of three organizational facts:

Ownership is diffuse. The system that blocks the integration is owned by a team that wasn't in the room when the AI initiative was funded, and has no incentive to absorb the migration cost.
The dependency is load-bearing in ways nobody documented. Ripping it out means discovering, in production, what it was quietly holding up.
The modernization has no demo. Replacing a data pipeline doesn't screenshot well in a board update, so it loses every budget fight to the chatbot.

The organizations escaping this are what McKinsey calls "deliberate modernizers" — they earmark a third or more of spend for change, design services for reuse, and let new capability replace legacy rather than accrete on top of it. This is unglamorous work. It is also the single highest-correlation factor with whether the other three seams ever close. You cannot meaningfully meet agents where they work — the pattern I described in Meet Agents Where They Are — if the canonical artifacts they need to operate on are trapped in a system that only answers questions once a night.

The architectural frontier most legacy stacks are missing entirely is state. The same intelligence hub tracks over 700 projects in the AI memory systems domain — the layer that lets an agent remember, retrieve, and reason over context across sessions. That category barely existed two years ago, and it maps directly to the question every modernizing org should be asking: where does durable, queryable memory live in our architecture, and is it the system of record or a bolt-on? An organization that treats memory infrastructure as a strategic layer rather than an afterthought is making a different bet than one wiring a model to a nightly export.

Seam three: pricing breaks when output decouples from effort

The third question is the one I think is most underestimated, and the one moving fastest. When autonomous systems produce and deliver value, the shape of the output changes — and every commercial mechanism built around the old shape starts to groan.

Per-seat pricing was an artifact of a world where value scaled with the number of humans doing the work. Agents break that assumption at the root: they don't log in, don't hold a named-user license, and can complete thousands of tasks in the time a human completes one. When the unit of production decouples from headcount, charging by headcount becomes both a revenue leak for the vendor and a budgeting nightmare for the buyer.

The market is already repricing in real time. By most trackers, pure per-seat pricing fell from roughly 21% to 15% of software companies between 2025 and 2026, and Gartner projects at least 40% of enterprise software spend shifts to usage-, agent-, or outcome-based models by 2030. Futurum's research finds fewer than one in five buyers still prefer classic per-user pricing.

The instructive examples are the ones charging for verified outcomes:

Intercom's Fin charges $0.99 only when its AI fully resolves a conversation — nothing for failed attempts — and generated tens of millions in its first year on that model.
Zendesk moved to billing for successful AI-driven resolutions rather than AI seats.

There's a technical substrate underneath this commercial shift that most pricing conversations skip. You can only charge for an outcome if you can control the cost of producing it — which is why adaptive model selection and cost-aware orchestration have become their own fast-moving categories; Gerolamo's intelligence hub tracks hundreds of projects routing work to the cheapest model that can clear the bar. The margin in an outcome-priced product is manufactured at the routing layer. An organization that hasn't solved cost-per-outcome internally cannot offer outcome pricing externally without bleeding — the two problems are the same problem viewed from opposite ends of the contract.

Notice what the Intercom and Zendesk models have in common: a contractual definition of a valid outcome. That's the catch, and it's where the chaos actually lives. Outcome-based pricing requires both parties to agree, in advance and in writing, on what counts as a delivered result — and how it's measured, attributed, and disputed. The pricing conversation is really a measurement conversation wearing a commercial costume. I wrote about this dynamic from the buyer's side in The Artifact Economy: when machines produce artifacts faster than humans can evaluate them, the scarce, priceable thing stops being production and becomes verified, attributable value.

Which is the perfect handoff to the seam nobody put on the ballot.

Seam four: the provenance gap nobody put on the ballot

I left a fourth slot open and asked people to fill it. Here is mine.

The first three seams are all, in the end, about producing value: using capacity well, modernizing the plumbing, pricing the output. But as autonomous systems scale, the binding constraint quietly migrates from production to verification and attribution. The question stops being "can the system produce this?" and becomes "can you trust it, prove who authorized it, and reconstruct how it got there?"

Call it the provenance gap: the widening distance between how fast autonomous systems generate output and how reliably an organization can verify, attribute, and stand behind that output. And the data on it is genuinely alarming.

A 2026 study of AI-agent security in the enterprise found that 82% of executives are confident their policies protect against unauthorized agent actions — while only 28% can reliably trace an agent's actions back to a human sponsor across their environments. That 54-point gap between confidence and capability is the provenance gap rendered as a single number. Just this week, the Technology Innovation Institute's Najwa Aaraj made the same point in Fortune: enterprise AI agents need proof, not promises — verifiable execution, not after-the-fact assurance.

This is no longer a fringe concern, and the research velocity proves it. The agent security model domain on Gerolamo's intelligence hub is now one of the densest on the platform — roughly 500 tracked entities spanning identity, authorization, attestation, and content provenance. When a capability area attracts that much builder attention that fast, it's a signal that the market has priced in the gap even where individual organizations haven't. The strategic question for a leadership team is whether they're tracking that frontier deliberately or discovering it during an incident.

Here's why this is the highest-impact item on the list, not a footnote to it:

It gates the other three. You cannot price an outcome you cannot verify (seam three). You cannot safely offload a contested decision you cannot attribute (seam one). And the legacy systems that lack clean provenance (seam two) are exactly the ones that make verification impossible.
It's invisible until it isn't. A provenance gap produces no symptoms during the demo. It produces symptoms during the audit, the incident, the dispute, or the regulator's letter — when someone asks "who decided this, on what evidence?" and the honest answer is a shrug.
It is not a policy problem. I've argued this at length in AI Assurance Is Not a Policy Problem: you cannot govern what you cannot observe. Provenance has to be manufactured into the system at runtime — as evidence, attached to the artifact — not asserted in a slide deck after the fact.

This is the seam we've spent the most engineering on at Adjective, because it's the one with the least off-the-shelf tooling. Zephyr exists to attach verifiable provenance to AI-produced work, and Gerolamo is built on the conviction that agents acting in the world need ground truth — scored, structured, attributable intelligence — rather than confident prose, an argument I made in Intelligence Primitives: What Agents Need as Ground Truth. When an agent at Gerolamo asserts that an entity is defensible, that claim carries a score, a reasoning trace, and a lineage back to the underlying signal. That's not a feature. That's the difference between an output you can build a business on and one you can only hope about.

The four seams compound — and that's the actual strategy

The reason the poll deadlocked is that these four aren't competing concerns. They're a chain, and a chain is exactly as strong as its weakest joint:

Technique readiness decides whether you offload the right work.
Architecture decides whether the offloaded work can actually run.
Pricing decides whether the value it produces is captured fairly.
Provenance decides whether anyone can trust, attribute, or defend the result.

A company can ace any three and still leak its entire AI investment through the fourth. The 95% who saw no return didn't fail because they picked the wrong model. They failed because value made it most of the way through the system and then escaped through whichever seam they weren't watching.

The work, then, isn't to pick the most important seam. It's to stop treating them as separate initiatives owned by separate teams with separate budgets — which is precisely how the diffuse-ownership problem in seam two metastasizes into all four. Efficient capital deployment in the AI era means owning the whole chain, end to end, so value doesn't leak at the handoffs. That's the entire reason we build at the infrastructure level rather than handing clients advice and walking away.

To everyone who voted and to those who left a comment with a fifth thing I missed: keep them coming. This is exactly the kind of ground-truth signal that shapes what we build next. If your AI investment is producing output that isn't converting into trusted, attributable, capturable value, start with a conversation — sixty minutes, no deck, and you'll leave knowing which of your four seams is leaking.