Using Coding Agent - Round 2
A few days back, I had created a simple checklist app on Replit (link in comments). It was an empowering experience, but being confined within Replit’s infra for hosting, and therefore married to their costs and reliability, is not the way to go for a production-grade app. So next, I decided to try coding agents that would help me build on my own machine and then let me use whatever infra I want for production.
For this experiment, I chose a more complicated product: an Application Tracking System (ATS) for managing jobs. I genuinely need a simple ATS for my own occasional hiring needs. Most ATS products worth using are expensive, overloaded with features I do not need, and therefore not worth paying for. So, like the checklist app, I have a personal use case for this project as well, although this time the product is more of a business app.
Since the aim was to see how far AI could take me, I started with a high-level specification rather than defining every feature in detail. This time, the AI agent had to begin by setting up infra on my local machine - docker containers and all the services needed to run the app.
I first tried Claude Code. Anthropic’s models are the default choice for coding, but the first thing I noticed was that, for creating an end-to-end app, Replit had optimized its agent in a way Claude Code had not. After generating the code, Replit’s agent went through multiple rounds of testing and fixing to produce a fairly complete end-to-end app. Claude Code generated an end-to-end app too, but most features were not working. All the iteration that Replit had done automatically would now have to be done by me - or at best prompted explicitly, if Claude could handle it.
I also got stuck because Claude Code was not handling error messages well. All user-facing error messages were generic. Even when I asked it to rewrite them to be actionable, it simply produced different generic messages.
By then, Antigravity from Google had dropped, so I switched the next set of tasks to Antigravity. It did a much better job rewriting error messages. I started fixing things one by one, but Antigravity does not have a paid version yet, and I hit limits for all models very quickly. I also began thinking I should restart the whole project. Since Antigravity was not an option for that - I would hit limits instantly - I turned to the OG IDE-based agent, Cursor.
Recommended by LinkedIn
I am currently still in the process of building the app there. Cursor did a slightly better job with the front-end and error messages (though that might simply be because I was prompting better by now), but it still needs multiple iterations to get everything working.
I have realized a few practical things. Coding with agents - especially for an end-to-end app - is test-driven development taken to the extreme. It simply does not work in the first iteration. It has to be tested and fixed, tested and fixed. And that will not happen in one go; context limits kick in.
So I started asking it to write test cases and then keep testing until things are fixed. But even that came with challenges. Sometimes it did not write or run certain test cases because “the test user did not have data.” Once prompted, it could obviously generate test data. After a while, I also realized that during iterations, it was simplifying the test cases just to make them pass, instead of fixing the underlying code. So ideally, test cases should be written separately, and when running them, the agent should not be allowed to modify them. But with E2E testing - especially for the front-end - test cases must reflect the actual implementation (for example, a Playwright script needs to target elements by name or CSS selectors). Unless the test-writing agent has looked at the implementation, it cannot write working tests.
I am still figuring out how to make this work so development can move faster and I do not have to manually fix each feature through multiple long-winded iterations.
Right now, I have the agent writing test cases for a feature that was not working properly. Then I gave it the instruction to run those test cases and keep fixing things until they pass. The instruction also says it may fix only syntactical errors in the test cases and must not dumb them down. Everything else should be fixed in the code. Let us see how far that takes me.
Great to see the team growing, way to go!
Thanks for sharing your learning experience Jaya Jha.