Verifiable Throughput: A Control Framework for AI Coding Evolution in Legacy Systems

Duo Wang

Published Jan 2, 2026

This post is a reflection on what I learned using AI-assisted coding throughout 2025 while working on a large-scale, business-logic-heavy service. The biggest takeaway: AI dramatically reduces the cost of implementation—but it raises the bar on verification.

“Vibe coding” is incredible for greenfield projects: you can move fast because there’s little surface area, few invariants, and limited blast radius. The agent can invent APIs, reshape the architecture, and iterate freely—and the cost of being wrong is low.

But in existing, business-logic-heavy services, vibe coding hits a wall:

The system encodes years of implicit rules (“discounts can’t stack,” “ordering must be stable,” “refund windows differ by region”).
Behavior is defined less by the code’s aesthetics and more by historical constraints, edge cases, and downstream consumers.
The real risk isn’t “does it compile?” It’s silent correctness drift and performance regressions that only appear under production traffic.

So the question for legacy code isn’t “Can AI write code?” It’s: How do we let AI change code safely—at high velocity—without breaking what the business depends on?

That’s where Verifiable Throughput comes in.

The bottleneck shift: from implementation to verification

AI coding agents have moved the bottleneck in software delivery. Implementation is now cheap and abundant; the scarce resource is verification.

To use AI effectively on legacy systems, teams need to shift from “shipping faster” to Verifiable Throughput: the rate at which you can ship changes with confidence because correctness and performance are continuously—and automatically—validated against reality.

The shift: from judgment to signal

Traditional development leans on human judgment to catch issues: “this looks risky,” “that might be slow,” “this feels wrong.” In mature systems, those instincts are built from context: tribal knowledge, outages, and years of edge cases.

AI agents don’t have that context. They optimize whatever signals we provide.

If the only signal is “tests are green,” an agent will happily generate code that passes tests while quietly degrading latency, increasing cost, or drifting from business rules.

Verifiable Throughput turns your test and measurement stack into a control system:

Correctness tests are Guardrails (what must not break).
Performance tests are the Gradient (what direction is better).

Guardrails prevent unsafe moves. The gradient guides safe iteration.

The three pillars

To enable autonomous improvement (or even high-velocity assisted improvement) in legacy services, your verification suite must evolve in three ways.

Recommended by LinkedIn

Why AI Struggles with Your Legacy Code

Valentin Tablan 3 months ago

The hidden dangers of over-reliance on AI in the…

George Antoniou 4 months ago

My Experience Using Replit AI for a Recent Software…

Sanjai Kumar M 3 months ago

1) Enhanced unit testing: intent as executable specification

In legacy code, the hardest part isn’t writing code—it’s preserving intent. Unit tests become the primary language for communicating that intent to an agent.

Property-based testing: Verify invariants across broad, valid input spaces—not just a handful of curated cases.
Boundary focus: Explicitly encode edge cases, error paths, and “should never happen” behavior that agents often miss.

2) Correctness testing: guardrails for global stability

Agents optimize locally. Legacy systems fail globally—at interfaces, dependencies, and integration seams.

Contract tests: Declare “sacred” interface boundaries so refactors don’t break downstream consumers.
Replay testing: The highest-leverage move. Record sanitized production traffic and replay it against candidate builds to create a reality-based correctness oracle.

3) Performance testing: a gradient the agent can follow

Legacy services often have performance shaped by real-world distributions, concurrency patterns, caches, and data skew. “It ran fast locally” is not a meaningful signal.

Production workloads: Synthetic benchmarks get gamed. Use realistic distributions, payloads, and concurrency.
Continuous feedback: Run benchmarks on every change, not just before release.
Resource efficiency: Measure cost (memory, CPU, DB queries, cache hit rate), not only latency.

Operationalizing: the Gate System

Make CI/CD explicitly produce two kinds of feedback:

Correctness Gates (Guardrails)

unit tests, replay tests, static analysis
output: Pass / Fail (blocking)

Performance Gates (Gradient)

production-like benchmarks
output: Report (“safe to merge” vs “regressed”)

This is the loop you want for legacy code: fast, automated rejection of unsafe changes; fast, production-grounded guidance toward better ones.

The path forward

The future looks like humans setting constraints and priorities while AI handles implementation and iteration. In legacy systems, the limiting factor is no longer “can we write the code?”—it’s “can we prove we didn’t break the business?”

Without Verifiable Throughput, you’re forced into a bad tradeoff: restrict agents until they’re marginally useful, or give them freedom and accept instability.

Verifiable Throughput offers a third path: continuous, safe evolution of legacy services—driven by automated guardrails and an optimization gradient grounded in production reality.

Paras Tiwari, PhD 4mo

Insightful article. AI has made code generation cheap, but shipping a product is fundamentally different. Production systems demand stability, predictability, and trust. The probabilistic nature of AI agents increases the risk of regressions and subtle failures, which makes verification, not generation, the real challenge. Without strong production signals and continuous validation, speed just amplifies breakage rather than value.

1 Reaction

Eric Xu 4mo

Excellent writing! I think in 2026 we will see many software systems mirroring the RLVR setup in training models: we construct a bigger harness with coding agents as part of the system to generate policy gradients, but we humans control the reward function to push the system towards a desirable direction. BTW I still remember the early days of the ai.codes auto complete plugin and your early adoption. How the world has changed!

Verifiable Throughput: A Control Framework for AI Coding Evolution in Legacy Systems

Duo Wang

The bottleneck shift: from implementation to verification

The shift: from judgment to signal

The three pillars

Recommended by LinkedIn

1) Enhanced unit testing: intent as executable specification

2) Correctness testing: guardrails for global stability

3) Performance testing: a gradient the agent can follow

Operationalizing: the Gate System

The path forward

Others also viewed

84% of Devs Use AI Daily. Only 29% Trust It. So What Are They Doing?

Part 1: The "Vibe Coding" Trap — Why Your AI Prototype Will Fail in Production

Can AI Really Code? What Our Experts Learned After Using AI Tools

Does AI Vibe Coding Rewrite the Rules? Revisiting The Mythical Man-Month in the Age of LLMs

How to Handle Long-Running Tasks in Agentic AI Loops

Will Agentic AI use an organic architecture?

AI Built It in Two Days. Why Is It Still Not in Production?

Your AI Coding Stack is Already Compromised — You Just Don’t Know It Yet

AI Is An Enabler, Not A Substitute

The Hidden Cost of Moving Fast with AI

Vibe Coding and Its Impact on Software Engineering

How to Overcome AI-Driven Coding Challenges

The Future of Coding in an AI-Driven Environment

AI in DevOps Implementation

Explore content categories

The bottleneck shift: from implementation to verification

The shift: from judgment to signal

The three pillars

Recommended by LinkedIn

1) Enhanced unit testing: intent as executable specification

2) Correctness testing: guardrails for global stability

3) Performance testing: a gradient the agent can follow

Operationalizing: the Gate System

The path forward

Others also viewed

84% of Devs Use AI Daily. Only 29% Trust It. So What Are They Doing?

Part 1: The "Vibe Coding" Trap — Why Your AI Prototype Will Fail in Production

Can AI Really Code? What Our Experts Learned After Using AI Tools

Does AI Vibe Coding Rewrite the Rules? Revisiting The Mythical Man-Month in the Age of LLMs

How to Handle Long-Running Tasks in Agentic AI Loops

Will Agentic AI use an organic architecture?

AI Built It in Two Days. Why Is It Still Not in Production?

Your AI Coding Stack is Already Compromised — You Just Don’t Know It Yet

AI Is An Enabler, Not A Substitute

The Hidden Cost of Moving Fast with AI

Similar topics

Vibe Coding and Its Impact on Software Engineering

How to Overcome AI-Driven Coding Challenges

The Future of Coding in an AI-Driven Environment

AI in DevOps Implementation

Explore content categories