When Technical Debt Stops Being About Code
These days it feels like anyone can build software almost overnight. Anyone, with coding agents, over the weekend instead of binge watching Netflix. Not production systems yet, but still surprisingly good-looking prototypes or MVPs.
So yes, code is produced faster than ever. But keep iterating long enough and the code becomes harder to read, harder to manage, and eventually even the agent starts getting lost in its own output. Anyone who has tried to review AI-generated source code knows what I'm talking about. No doubt. AI increases technical debt. But is this actually a problem?
If software can be regenerated at any moment, technical debt inside the implementation may start to matter less than we think. In a world where systems can be rebuilt in minutes, messy code might simply be replaced rather than carefully refactored.
The focus then gradually shifts from the code itself to the outcome the system produces.
As long as the outcome is correct, the exact implementation may become less important. The generated code might look slightly different every time the pipeline runs, yet still produce the same functional behaviour.
Which raises a deeper question. If code itself stops being the most important artefact of software engineering, what is?
The center of gravity is moving
For decades, the most valuable artefact a software team owned was its codebase. Repositories accumulated years of work: millions of lines of code, architectural decisions layered on top of each other and countless incremental improvements introduced through refactoring and bug fixes. Technical debt was something engineers could point to directly inside that code.
Once AI systems begin generating large portions of software automatically, the center of gravity gradually moves away from the implementation itself and toward the description of the system.
If machines are responsible for writing much of the code, the real intellectual work shifts toward defining what the system should do, how it should behave and what constraints it must respect.
The most valuable artefact is no longer the code itself, but the specification.
Taken to its logical conclusion, this leads to a somewhat radical possibility: code could become largely disposable.
This does not mean generated code becomes irrelevant. It still needs to be observable, auditable and reproducible. What changes is not its importance, but its permanence. Instead of treating a codebase as a long-lived asset, teams might increasingly treat generated code as a temporary artifact of a build process.
Spec-first instead of code-first
This shift is often described under the name Spec-Driven Development (SDD). The idea is simple: the specification becomes the central source of truth from which implementation and validation follow.
At some point a natural question appears: if specifications must become extremely precise, aren't we simply inventing another programming language? I like to think of it as another step in the long history of abstraction in software engineering, moving from assembly to C, from C to higher-level languages, and potentially now from code to system descriptions.
Describing business logic in a form readable as natural language is not a new ambition either. That is what COBOL was meant to be. It wasn't. But with AI, that old dream may actually work in practice.
However, one component becomes particularly important in this model: Tests.
TDD as part of Spec-Driven Development
In traditional workflows, tests verify whether developers' code behaves as expected. In a spec-driven model, the process effectively moves one step earlier.
The specification defines not only how the system should behave, but also what must be tested. Tests therefore become an executable expression of the specification. The relationship becomes clear: the specification defines behaviour, generated code implements it, and tests verify it.
If the outcome is wrong, the loop starts again. But the failure can appear in several places. The specification itself may be incomplete. The prompt given to the model may be wrong. The test suite may verify the wrong behaviour. Guardrails may be misconfigured. Or the generated implementation may simply be incorrect.
In a traditional workflow, a failing test usually points to one place: the code. In a spec-driven model, debugging means asking a different question first: where in the pipeline did the system go wrong?
In this sense, TDD becomes a natural extension of the specification. Tests stop being only a quality check and start acting as a contract.
At the same time, the quality of tests becomes a new bottleneck. A flawed test suite can be more dangerous than flawed code: a false positive does not simply hide a bug, it actively instructs the system to reproduce the wrong behaviour. But this creates a new trap. A flawed specification can easily produce passing tests and a system that behaves exactly as incorrectly described.
Guardrails instead of code reviews
Another implication is that traditional quality control mechanisms start to struggle at scale. Code generation can quickly exceed what humans can realistically review.
For this reason, quality enforcement gradually moves into automated guardrails embedded directly into development pipelines. Instead of relying solely on human reviewers, pipelines can enforce architectural rules, dependency policies, vulnerability scanning and security constraints automatically. If the generated system violates those rules, the pipeline fails.
Quality enforcement shifts from humans reviewing code to systems validating outcomes.
There is one more implication worth naming. LLMs are probabilistic so the same specification may produce slightly different code each time the pipeline runs. But that may not matter. Guardrails and tests exist precisely to ensure that different generated implementations still behave the same way.
Recommended by LinkedIn
Software that is built, not stored
If this model matures, repositories may start to contain something quite different: system specifications, test definitions, guardrails and pipeline configurations.
In a way, this would not be entirely new. Infrastructure teams have already been working in a similar model for years.
Tools like Terraform describe the desired state of infrastructure rather than the steps required to create it. Software development may begin to look surprisingly similar.
A pipeline might provision infrastructure, generate the application from the specification, validate it, and deploy the resulting system after a human sanity check. The development pipeline begins to look less like a traditional build process and more like a compiler for system specifications.
This shift raises an interesting question about intellectual property. If code is generated on demand, the real proprietary asset of a company is no longer the code itself, but the system knowledge that generates it: specifications, guardrails, ADRs and domain context.
Humans still matter
There is a catch though. That "obvious" edge case the business forgot to mention? Engineers build exactly what was written. UAT becomes a crash course in missing scenarios. We have all been there. Developers just never let it show. They filled the gaps quietly, interpreting ambiguous requirements, asking clarifying questions, improvising solutions that never made it into any document. When AI generates the implementation directly, that buffer is gone.
The specification does not lie. It just tells the truth about how incomplete it always was.
That is why even in a highly automated environment, one step remains deliberately human. Before deployment, someone still needs to evaluate whether the system actually behaves in a way that makes sense from a product perspective. Automated tests verify logic, but machines still struggle with questions of intent, usability and business context.
A human sanity check remains the final safety layer. That person is no longer reviewing code line by line. They are evaluating the behaviour of the system as a whole.
Iterating the system, not the code
But what happens when the system does not make sense?
In traditional development you fixed the code. In an AI-driven environment the loop looks different:
Instead of modifying the code directly, teams adjust the description of the system and let the pipeline generate a new version. The goal is no longer to polish the codebase, but to refine the definition of the system itself.
This sounds cleaner than the old way. And in many cases it is. But it introduces a new risk that is easy to miss: specification drift.
Every undocumented prompt tweak sent to a coding agent is, in effect, an unversioned change to the specification.
Teams push a quick fix, skip updating the formal spec, and gradually the system becomes something different from what the specification claims it does. That gap is the new technical debt.
And legacy systems? The real challenge is not regenerating them. It is reconstructing what they were supposed to be. Years of accumulated implementation with no specification to show for it. That reconstruction may turn out to be one of the most expensive engineering problems of the next decade.
The evolving role of the engineer
So what happens to the engineers themselves? If this trend continues, their role will evolve. They will spend less time writing implementation code and more time shaping the systems that generate it.
One emerging role is the system designer. Someone operating between a business analyst, software architect and product designer. Their responsibility is to define system behaviour, architectural constraints, and test scenarios. This role would also maintain architectural context through ADRs, documenting why particular design choices were made. I think of it as something close to what Palantir called a Forward Deployed Engineer who is embedded with users, translating real problems into system designs rather than implementing tickets from a backlog.
At the same time, someone must build the machinery that generates the software. Evolving naturally from platform engineering and DevSecOps, these engineers manage pipelines that generate applications, validate outputs, and orchestrate infrastructure.
In practice, these two perspectives describe the same system from opposite sides. One defines what the system should do. The other ensures it can be generated safely and repeatedly.
Where technical debt goes
Technical debt does not disappear. It simply moves. Instead of living inside massive codebases, it begins to appear in other places: vague specifications, missing behavioural scenarios, weak guardrails, fragile pipelines or drifting system definitions.
Technical debt used to mean messy code. It may soon mean something harder to fix: a poorly described system.
And unlike code, you cannot refactor a conversation that never happened.
Marek Zielinski I like your observation on technical debt shift and moving center of gravity. As an information/knowledge quality will be a key component, I would even call it an 'entire system' debt, not only technical one. Thanks for a good article, also nicely composed.
Great post Marek Zielinski. I am one of those semi technical everyone who developed an app in 2 days to prove a use case value, The advantage I see in product development is the ability to do rapid prototyping, proving value and removing ambiguity.
The line about not being able to refactor a conversation that never happened is the sharpest observation I have seen about AI-generated code debt. But I would take it further. The real technical debt in an AI-assisted codebase is not just the missing specification, it is the inverse relationship between generation speed and comprehension depth. When a developer writes code manually they build a mental model of the system as they go. When they prompt for it they get working code without that model, which means every future debugging session starts from zero context. The traditional technical debt metaphor assumed someone took a shortcut they understood. AI-generated debt is different because nobody fully understood what was generated in the first place. The tooling gap here is enormous. We need something like architecture decision records but for AI-generated code blocks that capture the intent, constraints, and tradeoffs that the prompt implied but the code does not make explicit.
I really appreciate the form, full sentences, paragraphs, reasoning etc. 🙂