When Models Converge, the Operation Becomes the Variable
The first wave of AI adoption was a race to build the agent. Which model performs best. Which workflow to automate first. Which pilot gets greenlit. That race was worth running. Agents matter.
But the conversation has stopped tracking where differentiation lives. Leading LLMs now deliver near-identical performance on standard benchmarks. The architectural advantage of selecting one model over another has compressed. When the models converge, competitive advantage migrates from model selection to what the organization builds around it: how exceptions route, who owns escalations, how the operation improves, and whether governance keeps pace with what the workflow does. Two organizations running the same model produce different results. The differentiator is the operating layer.
Russell Ackoff‘s principle applies directly: the performance of a system depends on how the parts interact, not on how they act taken separately. Model selection optimizes a component. The operation is where the parts interact.
Orchestration is part of the answer. Better agent frameworks, smarter routing logic, and more capable toolchains will improve what the build can do out of the box. That advantage is real, but it is temporal. When the orchestration layer commoditizes, and it will, the differentiator shifts again to the operation: who owns the exceptions, how the system learns, and whether the organization improves from running the work. Most organizations are still solving for the build. The ones that pull ahead are already solving for what comes after.
Five Decisions That Define an Operation
Most organizations have a version of an operating layer scattered across teams, policies, and individual judgment calls. Few have made it explicit. And when the conversation turns to operating model design, it gravitates toward the org chart, which shows who reports to whom but reveals nothing about how work gets routed, escalated, improved, or learned from. Starting with the org chart means starting with the wrong question.
For an AI workflow, the critical decisions are these:
- Where do exceptions go when the model is unsure?
- Who owns escalations and at what thresholds?
- How does work move through the operation when the happy path breaks down?
- What changes the rules when the operation keeps surfacing the same friction?
- How does the team learn systematically from what keeps going wrong?
These decisions will get made whether the organization designs them or not. The question is whether they get made deliberately, as part of an operating layer, or ad hoc, as people improvise in response to what hits production. One produces a system that improves. The other produces firefighting that scales with volume.
Business context embedded in the workflow helps, but it is not enough. The workflow needs clear paths for exceptions to move through the operation with explicit ownership at each stage. Without governance structures that define standards, decision rights, and accountability, the operation fragments as volume rises. And systematic improvement, the cycle of workflow refinement, playbook evolution, and process redesign based on what the operation surfaces, is what most implementations never build.
Individual productivity gains plateau when each person optimizes in isolation. Organizational operating improvements compound because each cycle feeds the next. Deming’s observation holds with new force: a bad system will beat a good person every time. An agent platform does not deliver the compounding part. An implementation team might build the workflow and prototype the operation, but rarely stays to refine it. That gap, between what a workflow can do and what it takes to keep it running, is where operational effort concentrates. It is also where most implementations break down.
The Operation Lives in the Complex Domain
AI agents are probabilistic systems. They weigh context, adjust to input variation, and produce outputs that shift with model version, prompt structure, and the data available at inference time. The platforms that build and deploy these agents have made remarkable engineering progress. The challenge is not what the agent can do. It is the operating environment the agent inhabits once it goes live.
Dave Snowden’s Cynefin framework maps this territory precisely. The framework distinguishes between ordered domains, where cause and effect are knowable in advance and best practice can be applied before acting, and complex domains, where cause and effect emerge only through running the work. In ordered domains, the decision cycle is sense, categorize, respond: recognize the pattern, apply the known rule. In complex domains, it reverses: probe, sense, respond. Run the work, observe what emerges, adapt. Most AI implementations treat the entire problem as ordered, as if the operation can be fully designed before deployment and then left to run. The systems thinking gap is the distance between that assumption and what production demands.
That gap is where human judgment enters: a customer dispute that references a prior conversation, a policy exception that requires manager approval, a data mismatch that only someone familiar with the account can resolve. It is where the operation learns from what keeps going wrong through regular operating reviews, playbook refinement, and process redesign based on what the work keeps surfacing.
These are governance problems, not engineering problems. And governance is increasingly a fiduciary question. AI governance on the board’s agenda is not compliance theater. It is stewardship: accountability for AI risks, outcomes, and traceability. The organization that cannot demonstrate deliberate choices about what AI does in the business is not governing. It is drifting into adoption because the technology is available.
The capability the operation requires is the ability to connect business outcomes to process design to technology decisions. That bridging skill has deep roots in software engineering and manufacturing. Peter Senge named the organizational form of it: a learning organization, one that expands its capacity to produce results by learning from how the work actually runs. It is the difference between someone who can build a workflow and someone who can design the system that keeps a workflow valuable as it scales. This wave of AI adoption is the first time business operations at this scale has needed that bridging capability, and developing it is part of what readiness means.
A good demo shows what the agent can do. A good operating model determines whether the business can sustain it, and whether the organization learns from running it.
