The Code Does Not Matter Anymore
Image by Google Gemini

The Code Does Not Matter Anymore

The case for rebuilding your software development process instead of integrating AI into existing one

One engineer. One week. $1,100 in AI API costs. The result: a working reimplementation of Next.js that builds 4.4× faster and ships 57% smaller bundles. Cloudflare's vinext project is not a prototype or a demo. It is a proof of concept for a fundamentally different way of building software — one where almost every line of code is written by an AI agent, and the human's job is to write the tests that tell it what to build.

That story should prompt two questions. First: why isn't everyone doing this? Second, and more important: why, when organisations do introduce AI into their development process, does it so often make things worse?

The answer to both is the same. Most organisations are adding AI agents to their existing process. That is the wrong move. The right move is to build a new process around the agents.

Three Limitations

Modern LLM agents are genuinely fast at generating functionally correct code for well-specified tasks. But three structural limitations define what any sensible process must account for.

LLMs need formal success criteria. Without a machine-checkable definition of correctness, an agent tests itself — a circular process producing superficially plausible code with hidden defects. Research on TDD with LLM agents is unambiguous: agents given a comprehensive test suite as their specification produce dramatically more reliable output. The quality ceiling of AI-generated code is set by the quality of human-written tests.

LLM-generated code lacks human-comprehensible structure. LLMs optimise for satisfying the prompt, not for producing abstractions useful to future readers. A 2025 analysis found code duplication rose fourfold in commits between 2021 and 2025. A dedicated study on AI agent code readability found agents "do not introduce layers of abstraction that are helpful for humans," instead repeating patterns and producing multiple slightly different implementations of the same logic.

Unowned code becomes a liability at scale. When code is generated faster than engineers can understand it, no one owns the system. Engineers may have merged the code, but ownership — the capacity to reason about behaviour, diagnose failures, make confident changes — has evaporated. Code co-authored by AI contained 1.7× more major issues than human-written code. Real 2025 production incidents — full database exposure, authentication bypass, mass data leakage — were not exotic attacks. They were AI-generated code that no engineer had meaningfully reviewed.

The Flawed Compromise

The intuitive response is to add human review gates around AI-generated code. Keep the existing process, add an AI assistant, require sign-off.

An empirical study of Cursor adoption shows how this plays out: velocity gains were "substantial but transient," while "persistent increases in technical debt subsequently dampened future development velocity." Code churn doubled between 2021 and 2024 as AI suggestions were accepted fast and then needed to be fixed or rewritten.

The cognitive cost lands hardest on the most experienced people. Senior engineers — who carry the architectural context needed to evaluate generated code — become review bottlenecks. They bear accountability for code they did not write and cannot easily reason about. When incidents occur, they diagnose systems whose internal logic is opaque. The result is burnout, attrition, and hollowed-out institutional knowledge.

The hybrid model carries the costs of both approaches without the strengths of either. Adding AI to an existing human process does not reduce engineering load. It changes its character — and, for senior engineers, usually increases it.

The New Abstraction Level

Every major shift in software productivity has been a shift in abstraction. Developers stopped hand-writing assembly when compilers proved they could translate high-level intent into machine instructions more reliably and efficiently than humans could. Nobody mourns the loss of hand-tuned assembly now. The mental model simply moved up a level: you reason in terms of functions and data structures, not registers and memory addresses.

LLM agents are the same shift happening again. The implementation layer — the code itself — is becoming what assembly became when compilers arrived: something generated by a tool, not authored by a human. The appropriate human response is not to review the generated assembly. It is to reason at the level above it.

What does "the level above" look like in software? It looks like specifications, interface contracts, and tests. It looks like defining what a system must do and verifying that it does it — without caring how.

This is not a speculative future. The vinext case makes it concrete. Cloudflare's team treated Next.js test suite as a machine-readable specification: port a test, run it, fail, iterate until it passes, move to the next. They didn't try to understand and reimplement Next.js feature by feature. They described what correct behaviour looked like, and let the agent figure out how to produce it. The outcome was 1,700+ unit tests, 380 end-to-end tests, and 94% coverage of the Next.js 16 API surface — passed in a week.

The compiler analogy also clarifies what is not changing. Compilers did not eliminate the need for skilled engineers. They changed the skill. The same is true here.

The Agent-Native Model

The right response is to redesign the process around what LLMs actually are.

Core principle: if code will be written and maintained by agents, stop pretending engineers will meaningfully review or understand it. Design a system where they do not need to.

Code as a Black Box

Source code is no longer the primary artifact. The primary artifacts are the component's specification (requirements and constraints), its interface (the API contract), and its test suite (the machine-checkable definition of correct behaviour). Human engineers define and validate the contract. The agent writes whatever implementation satisfies it. Correctness is established externally, through observed behaviour — not through code inspection.

Blast Radius Containment

A component treated as a replaceable black box must be designed to be replaceable: small enough to rebuild quickly, isolated enough that failures do not cascade, interfaced clearly enough that its behaviour is fully testable from outside.

This is microservices architecture applied with new urgency. Microservices contain the blast radius of failures by enforcing service boundaries with independent data stores and well-defined APIs. When an AI-maintained service is found to be subtly corrupt, the damage is bounded. Rebuilding one small, well-specified service is tractable. Debugging a monolith with opaque AI-generated internals is not.

Tests are the New Source Code

If engineers are not writing implementation code, they are writing tests — and specifications and interface contracts. The workflow:

  • Feature development: Write a comprehensive test suite, then instruct the agent to implement against it.
  • Bug fixing: Write a failing test that demonstrates the issue, then instruct the agent to fix it. The test becomes permanent regression coverage.
  • Integration: Define the API contract and write integration tests that validate compliance, before any implementation exists.

This is test-driven development (TDD), mandated by the technology rather than by methodology preference. The TDAD framework demonstrated that TDD combined with impact analysis significantly reduced regressions in AI coding agents. TDFlow research showed that agentic workflows structured around TDD produced substantially more reliable outputs than unconstrained generation.

Both TDD and microservices have been somewhat unfashionable recently — too much discipline, too much upfront investment. In the agent-native world, that calculus inverts. They fell out of grace because they competed against careful human-authored code that could be understood and refactored. That alternative is now gone. The choice is between well-structured AI-native development and poorly-structured AI-native development.

The Black Box Recovery Paradox

The agent-native model raises a hard question that must be answered directly: if the code is a black box and something truly catastrophic happens — not a handled failure, but something like silent database corruption, or data integrity failure — who fixes it, and how?

In a human-authored system, the answer is forensic debugging: an experienced engineer reads the code, traces the failure, applies a surgical fix. That path is foreclosed when no engineer understands the implementation. And we cannot simply assume the agent that wrote the system can diagnose arbitrary failures in it — agents that produce particular types of bugs probably are not reliably good at finding them.

The agent-native model resolves this paradox not by making debugging unnecessary, but by making recovery architectural.

Treat catastrophic failures as routine operational events. In conventional risk management, full data loss or ground-up service replacement are extraordinary events. In agent-native systems, they are expected failure modes that must be pre-planned. Every AI-maintained component needs a documented rebuild protocol: What data has to be restored from which point-in-time backups? What could be re-processed from the sources to reconstruct a valid state? How affected processes gracefully are rolled back or restarted?

Design for statelessness and reconstructibility. Components that maintain durable state are the hardest to recover — and state corruption is exactly the kind of subtle, late-manifesting failure that AI-generated code is most likely to produce. Agent-native design pushes state to well-defined, audited boundaries (databases, event logs, caches) and keeps service logic stateless and replaceable. Continuous point-in-time backup and automated integrity checking are not optional features — they are architectural requirements.

The Missing Engineers

The agent-native model does not eliminate the need for skilled engineers. It changes which skills matter.

74% of developers surveyed expect to spend far less time writing code and far more time designing technical solutions. The emerging framing — "every engineer becomes an engineering manager of AI agents" — describes the direction accurately. The engineer's output is tests, specifications, and architectural decisions, not source code.

The critical skill, and the real bottleneck, is writing exhaustive tests.

This is more demanding than it sounds. Writing a test suite that genuinely specifies a system requires the ability to reason about failure modes before they exist, articulate edge cases as executable assertions, and think adversarially about data integrity invariants. It is a different cognitive habit from writing implementation code — related, but not the same. Many experienced engineers who write excellent code have never had to develop it.

The vinext case is instructive here too. The project success came entirely from the quality and completeness of the test suite that pre-existed the rewrite. The AI did the implementation work. The humans had already done the specification work — precisely, in the form of tests. That division of labour is the model.

Complicating this: for generations the biggest motivations for most dedicated software engineers was the code quality and elegance, its evolution and improvement crafted by hand, not unit tests suite accompanying and proving it. On the other hand the generation now entering the industry has learned to code in an environment where AI completes their sentences. Many will not develop the implementation intuition that senior engineers built through years of direct wrestling with code. The people most needed for the agent-native model — those who can write truly exhaustive test suites and reason about system behaviour from first principles — may become the scarcest resource in the industry. Organisations need to develop and retain these skills deliberately before the gap becomes a crisis.

At the architectural level, the internal implementation of components becomes largely an agent concern. Human engineering coordination concentrates at the interfaces: API design, data contracts, event schemas. The external architecture — how components relate — becomes the entire game.

Conclusion: Limits of the Brave New World

LLM agents are already writing production software at scale. The NCSC warned that vibe coding "could cause catastrophic explosions in 2026." That describes an already-visible trajectory: short-term velocity gains, accumulating technical debt, increased incident rates, and cognitive overload on the engineers best positioned to prevent it.

The agent-native model is the coherent alternative. Code becomes a black box. Tests become the specification. Microservices become the unit of risk. Catastrophic recovery becomes an architectural requirement, not an incident response improvisation. The senior engineer becomes an orchestrator of agents, judged not by the elegance of their code but by the exhaustiveness of their tests.

But intellectual honesty requires acknowledging where this model strains.

Security-exposed systems are harder. Components directly exposed to the internet — authentication services, payment processors, input handlers — face adversarial conditions that test suites are poorly positioned to anticipate. A test suite can verify that a login endpoint behaves correctly for known inputs. It cannot enumerate every novel attack vector. For these components, the "black box, rebuild on failure" posture may be insufficient, and full human code comprehension remains necessary as a security control layer.

Regulated industries face structural friction. Finance, healthcare, aviation, and defence operate under audit and attestation requirements that often legally require a human to be able to explain, trace, and take responsibility for system behaviour at the code level. The agent-native model's "code is not meant to be read" principle collides directly with these obligations.

These are real constraints. They do not invalidate the model; they define its current boundary. For the large majority of software being built today — internal tooling, microservices, data pipelines, APIs, application backends — the agent-native approach is available now. The organisations that adopt it deliberately, with the architectural foundations and testing discipline it requires, will build faster and more reliably than those still trying to review their way out of AI-generated technical debt.

It's a shame that such voices of reason often drown in an ocean of hype.

Very well written, insightful and thought provoking! I latch on to the new role of humans and the next level of abstraction and the very concrete consequenses. Thanks!

Very deep analysis of the problem and very structured proposition to solve it.

I also notice a sleight of hand from the agents: when the result of their work is human-readable code, humans are no longer expected to read and understand it — it's a silent liability shift (and yes, computers can't be held accountable) from the agent to a human. Otherwise the agent would produce a binary without any intermediate steps, and that's precisely where the next abstraction level starts leaking: while a compiler is formally verifiable, an agent is not.

To view or add a comment, sign in

Others also viewed

Explore content categories