Build, Run, Improve
― Build, Run, Improve

The Build Is the Starting Condition

Most AI workflows follow the same delivery arc as traditional technology projects. An implementation team configures the agent, connects the integrations, validates the happy path, accounts for the edge cases they can anticipate, and hands the system to the client team. What arrives works under controlled conditions. What production demands is something different.

That arc made sense for projects with a clear end state. A data migration finishes. A system integration stabilizes. An ERP deployment reaches steady state. AI-enabled workflows do not reach steady state. They are living operations where the inputs change, the exceptions shift, the models evolve, and the business context moves continuously. The build is not the finish line. It is the starting condition. The question is what happens next.

The Case for Sustaining a Human-in-the-Loop

The ambition to remove humans from the workflow entirely is understandable. Autonomous agents, end-to-end automation, minimal intervention. Anthropic’s 2026 research on agentic coding found that even in the most advanced AI deployments, roughly 80% of tasks still require active human judgment: setup, supervision, validation, decision-making. The opportunity is not autonomous systems. It is human-guided workflows that preserve context, make results provable, and keep people in the loop where judgment matters.

Code can be verified with tests. Run the suite, get a pass or fail. Most business processes cannot. There is no automated test for whether an exception was resolved correctly, whether a judgment call followed policy, or whether a routing decision sent the case to the right person. Verification depends on humans reviewing outcomes, comparing them against expectations, and feeding that judgment back into the system.

AI workflows are probabilistic. The system produces different outputs for the same input depending on context, model version, and the data available at inference time. Traditional software converges to deterministic answers; AI systems do not. That difference means the system cannot validate its own outputs. It needs humans to arbitrate which results were correct in context and which missed the mark. The learning mechanism depends on that judgment loop. Without it, the system reproduces whatever patterns it absorbed in training. With it, the operation teaches the system what the business actually needs.

That feedback loop is what the run phase produces. Design work establishes the starting structure: routing logic, escalation paths, role definitions, initial playbooks. Production reveals what design cannot anticipate. A procurement workflow routes a purchase order for approval, but the vendor’s pricing has changed since the contract was last negotiated and the tolerance rules for price variance were never written. Someone needs to decide: approve at the new price, hold for renegotiation, or reject and re-source, then record that decision so the next pricing discrepancy resolves faster. That is not a build problem. It is an operating decision that only surfaces once the work is running, and it is the kind of decision that teaches the system how to handle the next one.

Improvement Is Not a Phase

The Toyota Production System built an entire discipline around this idea. Standard work is not a document written before the operation starts. It is the current best version of how the work runs, rewritten every time the operation learns something new. Run the work, capture what breaks, update the standard, run it again. Toyota called this cycle kaizen, and it makes each iteration faster than the last. The mechanism is the same in AI operations, even if the tooling is different.

The typical project lifecycle treats improvement as a separate activity: a post-launch optimization sprint, a quarterly review, an annual process refresh. In a living operation, improvement is continuous because the operation keeps generating new information about what works and what does not.

Exception patterns reveal where the workflow’s assumptions break down. Resolution data shows which cases are expensive and which are routine. Escalation frequency signals where human judgment is being consumed on problems that should have been codified into rules three cycles ago. This information exists only because the operation is running. It is not available at the design stage, and it does not hold still long enough for periodic reviews to capture it.

Consider an insurance claims workflow where the same document-mismatch exception keeps escalating to senior adjusters. In the first cycle, the team documents the pattern and writes a resolution playbook. In the second, the playbook becomes a routing rule: document mismatches below a threshold resolve automatically. By the third cycle, the exception category has shrunk by half and senior adjusters spend their time on cases that require judgment. That is one exception type, in one workflow, across three improvement cycles.

Multiply that across every exception category and the returns compound. Faster documentation feeds faster playbook updates, which feed faster rule absorption into automation. Each cycle reduces the share of exceptions that require human involvement and increases the share the system handles autonomously. Cost per case drops. Throughput rises. The feedback loop shifts from diagnostic (“what went wrong”) to predictive (“what should we expect tomorrow”).

Google’s DORA 2025 research found that organizations with fast feedback loops achieved 20 to 30 percent productivity gains from their AI deployments. Organizations coupled to legacy processes with slower cycles saw minimal benefit. The difference was not the model. It was how quickly learning moved from the operation back into the system.

Herbert Simon‘s distinction between programmed and unprogrammed decisions maps directly to what improvement produces. Programmed decisions follow established rules and precedent. Unprogrammed decisions require judgment in situations the rules do not cover. Each improvement cycle converts what was unprogrammed into programmed: the exception that required a senior adjuster’s judgment in January becomes the routing rule that resolves automatically in March. A living operation systematically expands the boundary of what the system handles on its own, freeing human judgment for the cases that remain genuinely novel.

Where the Economics Shift

Organizations evaluating AI workflow investments tend to compare build costs. Which platform is cheaper to configure. Which implementation partner has the lowest day rate. Which vendor promises the fastest time to launch. These are reasonable questions about the wrong phase.

The build is a fraction of the total cost of running an AI-enabled workflow. The operating cost, the exception cost, and the cost of improvement (or the cost of failing to improve) are where the economics concentrate. A cheaper build that produces an operation nobody can sustain is a deferred expense that compounds.

Eliyahu Goldratt’s Theory of Constraints identified this pattern in manufacturing: optimizing the fastest step in a process does not improve the system. The constraint determines throughput. In AI workflows, the build is rarely the constraint. The operating layer is. As organizations deploy and discover that the build cost is the smallest line item, the ones that optimized for speed of deployment find themselves stranded at the bottleneck they skipped. The ones that built for operations discover they operate at a cost advantage that widens with each improvement cycle. By the time the difference is visible, the decision should have been made months earlier.

A build-and-hand-off engagement front-loads the spending and assumes diminishing effort after launch. A living operation requires sustained investment, but the cost curve bends downward as the system absorbs what it learns. The question worth asking is not how fast the workflow can be built, but whether the team, the structure, and the operating model exist to keep it running once it ships.

A build-and-hand-off engagement assumes the hard part is over at launch. A living operation assumes it has just begun.

The Prosable Path stages the transition from pilot to living operation, so the investment in building does not get stranded at launch.