I Built A Coding Agent
I built a coding agent. Not because I thought it would be a particularly novel product in a world where every new product seems to be an agent of some kind (#bring back 90’s Clippy), but because if we all are going to be interacting with this sort of software from now on, I want know how it works. They say you don’t truly understand something until you can teach it. Well in my case I’m going to make it and then blog about it. Close enough.
There is another reason, and that is that I hate being forced to use a specific product. I like choice, and so should you. At the moment, quite a few big players (Google, OpenAI, Anthropic, Meta, Microsoft (who basically are OpenAI), etc) are all racing to see who can dominate the market and be top dog. But if one, maybe two, succeed and we end up with the same situation as we have with modern smartphones; in reality me, you and most of the rest of the global population have lost. We end up with overpriced and under performant products. The best case scenario (and we’re not too far off it at the moment, but it needs to stay like this) is a world where there are tens if not hundreds of high-quality open source (often called open-weight) models that are available from lots of different providers. This is what I wanted to bake into my coding agent. Lots of choice with lots of different models, and the ability to flip between them with ease.
I’m envisaging some of you out there pausing at this point and saying “Hang on Rob, I know you’re incredibly good looking and a wizard on that keyboard, but are you really claiming that we’re in a world where we have tens of good models available?” And you’d have a point to question it, it feels like OpenAI and Anthropic are getting close to being the two most dominant players (at least in the West, and maybe Google?), but carry on reading and I hope you’ll be convinced that you can actually go a long way without touching either of these providers.
Architecture
Okay, we started a little bit techy, and for those of you not in the industry I apologise, so let’s back up a bit. To start with you may be wondering what an earth a coding agent is and why you would want one in the first place?
Picture this: you’re lying in the bath, your favourite classical composer tootling away in the background, sipping on a Don Perignon and all of a sudden an amazing technical idea springs to mind. Now several years ago if you wanted to act upon this idea you’d have to leap out the bath, fire up your computer, and start designing architecture and then once you had a fair idea of what you wanted, you’d need to actually get down and type the lines of code that would bring this into reality.
With a coding agent, however, you can flip out your favourite smart device (which lets face it, is probably not that far away) open up the coding agent interface and either start typing or speaking your idea whilst the large language model listens to you, tells you how amazing you are, and hopefully prompts you in the right direction to make your project become a reality. Not only that but once you’ve refined the first couple of features, this personal assistant can then trigger another sub agent to actually go and write the code for you. All without leaving the comfort of your bath and Don Perignon.
Sounds pretty amazing right? And some of you might be wondering whether this is too good to be true. However, this sort of flow is exactly what the market is currently based upon. Some of you in the industry might have heard the phrase “lights out codebases” which is a reference to the so-called dark factories in China which are so automated that no human being need be there regularly so they can turn the lights off. The same idea is currently going around the tech industry. What if you had a codebase that was driven by human beings but where no human being ever actually looked at the code? This idea is not limited to tech, it is part of the allure, and also the fear, that artificial intelligence has brought over the past couple of years. Well, if this is the future I want to get hands-on and dirty.
So what did I actually build? A Slack app called “Coding Overlord”. I will give you the very high level user flow:
Pretty cool right? And all completely open source and configurable in any way you want. If you are interested you can checkout the code here.
Recommended by LinkedIn
What I Learned & Productivity Comments
So, the proof’s in the pudding, how did it behave in the real world? Well, the answer is, not too badly. I deliberately built in several tools that meant that I could change which model I was talking to just by requesting it. There’s a really cool service called OpenRouter, which I highly recommend if you’re into model orchestration, that allows you to flip between all sorts of different LLMs with ease. I tried the big obvious ones e.g. OpenAI’s GPT 5.4, Anthropic’s Claude Opus 4.6, but also some less well-known ones like Z-AI’s GLM-5, minimax-m2.7, Gemini 2.5-flash, e.t.c. I wanted to keep costs down, so I settled on GPT 5.4-mini for coding tasks and minimax-m2.7 for planning the tasks. Both of these models are about 10x cheaper than the top tier ones. However an honourable mention goes to Z-AI’s GLM-5 which is also a very accomplished coding model and a good alternative to OpenAI and Anthropic.
I pointed my coding agent at itself, (I thought it would be cool if it helped to make itself better) and amazingly actually implemented a few features via Slack on my phone. Really cool when you have an idea on the tube.
Now obviously that’s exciting and gets C-suite execs foaming at the chops for obvious reasons i.e. productivity boosts and potential decreases in head count, and I would be being disingenuous if I said that there was no reason for that excitement. However, I wanted to put it through its paces and see what happens when you really get into the weeds of a more tricky issue.
A good example of this is an architectural overhaul. My coding agent started as just a small LLM that would implement code described by an issue (no planning feature). When it came to adding the planning feature, I wanted it to fold in nicely, for the non-software engineers amongst you this basically means reduce duplication of code, minimise 3rd party library usage, make sure that database usage was as low as possible to keep things snappy, e.t.c, everything you should expect from good software design. By this point in the project I’d also settled on a certain style (way of writing code) that I wanted the agent to continue to follow. This is where things became trickier. I found that, despite being prompted not to, the LLM would consistently go down routes I didn’t want it to follow, write code that I considered not very maintainable, not fully understand the task in its entirety, and not be able to design the software in a way that I considered was relatively future proof. Increasing the size of the LLM (using one of the most premium models e.g. Claude Opus 4.6) did help, but I found that at a certain level of complexity the same issues would appear. Now this tool is in its infancy, and so I am positive that there are many improvements that one could make that would help overcome some of these issues, however, the small attempts I did make to improve things felt like I was fighting the LLM in trying to make it do what I wanted it to.
Wrapping Up
So what have I learned? Well, the motivation as I said at the start, was to learn something about how these systems work. Not LLMs themselves (I covered that in another post here), but the automation systems which are fast being built around them which we have been told hold the promise of huge amounts of productivity gains.
It is undeniable that these tools are incredibly useful. There are vanishingly few scenarios where this technology is not widely adopted everywhere within the next 5 to 10 years. However, if there is an overall message in this article, apart from showing off a cool little project I made, it is to urge caution. There are two extremes when it comes to artificial intelligence, the fervent and evangelical, and the atheists and deniers. Both these groups will try and sell their vision, and the more that I use and come to understand this technology, the more I think that they are both wrong. Large language models are absolutely here to stay, and I am sure will be put to good use over the coming years but (yes I’m going to say it) I think they’re going to fall short when it comes to delivering some of the promises that we have been sold. Will there be job market disruption? Almost certainly, and we’ve probably already seen it. Will the human race no longer need to work? Almost certainly not. Are we entering a technological age of abundance for all? Very doubtful. Or are we entering an age of techno feudalism, ruled by the few oligarchical elites? Hopefully not.
So, if you are a non-believer get your head out of the sand and get used to these new tools. And, if you are already evangelical, don’t make the mistake of confusing artificial intelligence with actual intelligence. Peace out fellow humans.
(No LLMs were hurt in the creation of this article).
We will give it a go! Feedback in a couple of weeks. Cheers, Rob