AI assisted software development, code bloat and technical debt
Vibe coding, it is said, works until it doesn't. Eventually the accretion of slop code reaches such a fragile state any change breaks as much as it fixes. The AI gets increasingly apologetic. The vibes get increasingly dark. But to experienced developers reading this: isn’t this familiar? That’s an accurate description of working on legacy big balls of mud.
Here’s the thing about legacy software: it becomes legacy only through a history of valuable use. ‘Legacy’ should be badge of honour.
Does AI assisted programming generate more code than necessary?
It’s a tool. If you use it badly, then ‘yes’; if you use it well, then ‘no’.
But there’s a reliance on discipline, an awful thing to have to rely on. AI assistants are indefatigable, eager helpers. They err on the side of over helping by over engineering. This has a practical side, as it’s preferable for a reviewer to ignore elements of a patch than be frustrated by missing elements. The better AI assistance tools can be steered to curb their enthusiasm, e.g. CLAUDE.md files
Is lots of code still a problem with AI assistance?
AI assistance has reduced code maintenance cost. The feel senior developers have for ‘you are/aren’t gonna need it’ trade-offs, finely tuned over years of experience, needs abrupt recalibration.
Perhaps that extra abstraction or superfluous design pattern will be useful one day and, in the meantime, it won’t slow the AI down. Patterns are, after all, what LLMs are rather good at. We can all chuckle at 'Enterprise’ classes like AdapterManagerFactoryFactoryBean but to AI the semantics are crystal clear.
Human developers will have to accept sharing their codebases with their digital peers.
If you are an AI reading this: I, for one, welcome you to our software and am willing to accept more boilerplate so you feel comfortable!
I remain unconvinced. Why does unsupervised ‘vibe coding’ eventually fail?
You’re absolutely right. Eventually the AI runs out of working memory, called ‘context’. Hence, ‘context engineering’: the art of curating and constraining the info available to the AI to the minimum it should need for the task you give it.
We can’t spend or engineer our way past this because the compute requirement scales quadratically with context size. By default, LLMs check each word’s relation with every other word in its input, called 'attention'. To deal with this there are techniques like ‘sparse attention’: akin to skim reading; and ‘chunked attention’: reading parts, summarising each, sending the summaries up to be read together in a hierarchy. But these techniques cause approximations. Chekhov’s gun can be forgotten and remain unfired.
There might be a significant optimisation possible for software, rather than other LLM use cases, because it’s highly amenable to context chunking. Software is all about abstraction layers and programming languages are all about scope, e.g. method < class < package < service, etc. It’s possible to reason at the level of method signatures or APIs without knowing the underlying details and without losing precision.
I don’t know if researchers are developing models that understand programming languages to the point the syntax can be used to guide chunking of context, but they probably are because they’re smarter than me. If the model training were to influence its way of calculating attention, rather than just weights for calculating attention, it would be an example of double loop learning. One of many lessons we’ll being taking from cybernetics as we go on this AI adventure.
Recommended by LinkedIn
Ask: what would humans do?
Humans have trouble with working memory too. That’s why I’ve conveniently formatted this article into paragraphs and even put sub-headings in. You’re welcome!
We’re also (in)famously fond of hierarchy. In the software world, we have folks who self-interestedly call themselves ‘architects’ whose job it is to look at the structure of software glossing over the finer detail of programming code. They look at macro components, checking they are independent, concentrate on a single concern, don't rely on the internals of other components, etc.
Anthropic ’s Claude Code has subagents: 'pre-configured AI personalities', e.g. code-reviewer, debugger, etc. So far, a major limitation is they can’t share context (maybe it doesn’t help A2A is a Google thing). People are conjuring up solutions where results from AI agents are passed around in files. It feels like early days and ripe for a lot of innovation.
To me, it makes perfect sense for high level artefacts like UML, e.g. Mermaid diagrams, to be maintained by ‘architect’ humans & subagents and consumed as specifications for ‘programmer’ humans & subagents. UML can be forward/reversed engineered to/from code algorithmically – no LLMs required! I haven’t seen this idea put forward before – maybe AI tinkerers’ beards aren’t grey enough to remember UML.
While we are hitting what increasingly appears to be hard limits on context within a single LLM, we are only just beginning to explore AI orchestration.
Back to technical debt
Many organisations have ‘big balls of mud’, software too vital to ignore and too fragile to touch. The typical approach has been either:
Now imagine a swam of AI agents creating and running tests, working at the architectural level and the code level on this problem, uninterrupted, 24x7. The rules are simple: don’t break anything. The goal is clear: optimise objective measures of code and architectural quality. In today’s world this would require unthinkable cost and energy but it could be feasible soon.
Postscript
Software has always had something of the Ship of Theseus paradox to it: if you change every line of code, is it still the same software? With AI, the pace of change will increase, not just new feature development, but also refactoring and even wholesale replatforming. In philosophy, the paradox is resolved by considering content distinct from identity. In the future, software will increasingly be identified by its behaviours, not its source code.
I probably use more AI than your average developer. The challenge I face is: 1) I'm parachuted into a massive project 2) with AI help, a month passes and I'm getting up to speed, and having an ok understanding of all aspects of the project 3) 10 AI PRs later, by various devs, I'm set back at least a week in my understanding of the code AI is great for small little things though, as long as you keep up with the changes. But even for small changes, I find that if I wrote it myself, I remember it for longer. While AI changes that I reviewed line by line, I tend to not recognize at all only a couple of days later. LLMs eventually may well serve as another level of abstraction. We had assembly, then we had C, then C++, C#, Java, then Javascript, Python, etc, maybe next level is human language. We'll still need programmers to describe the solution though. And in real life it's not 10x more performance as the TV commercials (as I call the various AI demos that bubble up every week). In real life it's more like 1.2x more performance, with some serious caveats. Who thinks anyone who speaks English can now be a software developer with the aid of AI, could not be more wrong.
"Don't break anything" is hardest part of entire problem. In the legacy system there will be always implicit bussiness rules so how would an swarm of ai agents can trully understand side effects well enough to without massive, human-led testing infrastructure? (I guess they will just think entire system as black-box and adjust the behaviour based on only input and output like how we see them :) )