My Claude Code Agent Teams Experiment
Developing working software with my agent team…
OK, I have a technical programming background. My career swung over the management side of things relatively early in my career and software engineering folks generally enjoyed working with me (apologies to the few exceptions that might raise their hand!!) because I understood coding firsthand.
Not to bore you, but over the course of my career I have used Basic, Cobol, C, C++, Smalltalk, Objective C, Java, JavaScript, and Python. I have looked at R and Rust. However, I no longer write code and have no interest in reviewing it either. Haven’t for 15+ years unless I got an itch and wanted to learn something new – like when I dabbled in understanding Pytorch and Tensorflow, which made me dive into Python.
Why am I telling you this? To set the stage so you understand what I might know or not. I am not an expert programmer but I have a really good understanding of programming languages.
With all the crazy excitement about AI coding agents hitting the market over the last 18 months, I have been following the developments with interest and curiosity.
From my viewpoint, the easy parts are getting easier, and the hard parts are getting more accessible, because you can get into the hard parts without much prior knowledge (see below about my experience with AWS and Terraform).
In other words, you can more easily get into deep water!
We have seen a fundamental change over the last 5 years:
I have a guess where this is going, but instead of speculating I decided to run a real-world experiment to demonstrate the point.
Follow along to see what I did, blow by blow, and the conclusions I came to:
Start time: Wednesday, February 11th, 2026, approximately 8:00 PM
1. I installed Claude Code “out of the box”, no customization, no skills configuration, no special setup of any kind. I am subscribed to the Claude Max plan.
2. I tried to find a moderately complex application to write that was complex enough to satisfy critics but simple enough so I wouldn’t spend an entire week on it. Not a toy but not a full-blown system. I also wanted to create something “enterprise grade”, i.e. follow a SDLC and tech stack / tooling that is common in enterprise environments. I settled on the idea to create an Automated Teller Machine (ATM) simulator. This entailed an application with front-end, back-end, database, local Docker development, final Vercel or AWS deployment. And a management console.
3. I prompted Claude (web) to help me define a CLAUDE.md file that laid out the desired functionality, rules of the road, tech stack, roles, etc. Here is what I asked:
I want you to help me create a markdown file that I can use to provide guidance to Claude Agent Teams. I want to create an application, written in Python, that simulates a bank ATM (Automated Teller Machine).
The application is supposed to represent real world functionality of an ATM, such as being able to make deposits, withdrawals, money transfers between accounts, account inquiries, printing accounts statements (to PDF files), etc.
I want to use Claude Agent Teams to have agents work on this application development problem themselves. Roles that I can imagine agents need to fill are UX designer, software architect, software engineer, software engineer in test, technical writer, cloud specialist / SRE, etc. I am open to hear about additional agent suggestions.
I want to start small with the ATM application running on a local Docker image. Once it is fully functional, I want to host the application on a preferred hosting service, such as Vercel or AWS.
How do I get started? Help!
4. Claude and I bantered back and forth, for example, it suggested I did not need a technical writer as it would document things as we went along. It also suggested that Vercel was not a good hosting option. Compared to AWS, Vercel is very frontend apps oriented vs. AWS which is better with backends that use PostgreSQL, background jobs, and persistent states (ECS Fargate or App Runner). Check.
5. I tweaked up the unit test coverage percentages from the suggest 80% to 100% on business logic and security code, 95%+ overall, with any exclusions explicitly documented and justified. Claude already had defined the initial 42 End-to-End test cases. Check.
6. It asked me what license I want (MIT), what Python version (3.12), and provided me with a zip file that contained the GitHub project scaffolding, including the initial CLAUDE.md file. Check.
7. I took the zip file and unzipped its contents to the GitHub repository. Ready to start.
8. I asked Claude (web) how to show me how to setup Claude Code (terminal). It provided easy to follow instructions. Claude code was ready to go. Check.
9. I started the agent team by entering the following prompt on the terminal command line (see the CLAUDE.md file):
I need you to act as Team Lead for building a Python ATM simulator application.
Read CLAUDE.md thoroughly — it contains the full project specification, team structure, and development phases.
Create an agent team with these teammates:
1. Architect — system design, data models, API contracts
2. Backend Engineer — core Python/FastAPI implementation
3. UX Designer — Textual terminal UI
4. SDET — testing strategy and test implementation
5. Security Engineer — auth, encryption, audit, threat model
6. DevOps Engineer — Docker, CI/CD, deployment config
Start with Phase 1, Sprint 1. Use delegate mode — coordinate only, do not implement.
Require plan approval from all agents before they begin implementation.
Assign clear file ownership per CLAUDE.md to avoid conflicts.
10. The agent team started chugging along, prompting me every couple of minutes to review commands, artifacts, or make decisions. In between I watched Netflix, cooked, wondered where this was going, went to bed around midnight when Claude had finished Sprint 2 (if I remember correctly).
11. Back at it the next day around 8:00 AM, I mostly followed Claude’s guidance, sometimes I asked clarifying questions, I once asked it to make sure that we will choose the most cost-effective AWS hosting options. Initially I performed all the git push commands up to GitHub myself, but other than that, the Claude agent team coordinated amongst themselves, committed changes, resolved conflicts, implemented tests (unit and E2E), found bugs, and fixed bugs. After a while I let Claude Code push to GitHub itself, just asking me for permission.
12. Claude progressively worked through the phases / sprints as defined in the CLAUDE.md file. Sometimes, with a new sprint gate finalizing the sprint deliverable, it would ask me if I wanted to rerun all the tests, which I confirmed. This frequently found additional bugs, which Claude then proceeded to fix.
13. The heaviest lift for me personally was configuring AWS, not because it’s difficult, but because it involved a lot of tedious setup work, access token, etc. Stuff I really hate doing! This was towards the end of process. Terraform and AWS configuration took the most time and iterations. Not sure if that was because I fat-fingered some things, though - so I am not blaming Claude for this.
14. No HTTPS – I made a call to skip the https setup intentionally to save ~$16/month on an Application Load Balancer (ALB), which is the standard way to do HTTPS on ECS Fargate. AWS doesn't let you attach SSL certificates directly to Fargate tasks. So that’s on me, not a Claude Code shortcoming.
15. After the backend was implemented (v1.0), I directed Claude to plan the add-on of a modern web front end (v2.0). I provided three images from the web that showed different ATM machines; at this time, I let Claude run and do the plan, while I took off for the day.
16. Next morning, the plan was done, I reviewed and provided feedback, had an extended discussion about what kind of front-end technology to use (Jinja, React, HTMX, etc.), why, the drawbacks of one over the other, etc. This was educational for me, as I am not a front-end guy. We finally agreed on React + Framer Motion because I wanted a fancy user interface – after all, it’s in experiment, might as well go all in. Desktop and tablet support only. No Mobile support. Because of this, we added a Frontend/UX Engineer agent to the team. In retrospect, I am not sure we lived up to the fancy aspirations.
17. After final plan discussions, I suggested some safety measures such as explicit reminder to compact after each sprint, so we don’t run out of context mid-development. After that, off we went to work on the front-end GUI. Did I just refer to us with the collective “we”?
18. … working through the plan… one sprint at a time… develop… build… test… iterating… Claude occasionally asking questions, asking for permission to proceed, me reviewing, sometimes hands-on testing… giving feedback… asking for fixes… chugging along
19. How about DevSecOps? Oops, I didn’t think about that upfront, LOL… asked Claude to incorporate free security scanners into the CI job (added Bandit npn audit, Triy, gitleaks, GitHub Dependabot runs). Found and fixed several security vulnerabilities.
Recommended by LinkedIn
20. There were some AWS security issues that were found, but they were based on architectural decisions I made to save money (no NAT gateway, public subnet design for cost). The proper fixes (VPC endpoints, private subnets + NAT, CMK encryption) are beyond the current scope. If this app would go to production, we’d have to fix them:
21. Boom!! Done! I personally did not write a single line of code. I only directed Claude Code. All code was produced by Claude, including all scripts, CI, etc. Everything in the repo was produced by Claude (find link in the Addendum below).
22. I started testing the application manually, doing my personal User Acceptance Test, on my local Docker image. Didn’t like some of the screen layouts, colors, spacing… more feedback… iterating… fixing manual bugs, building, committing, retesting, pushing… iterating.
23. Turned the repo public on GitHub, which enabled Dependabot, which flagged 8 PRs for us to process. Fixed / pushed all those. Check.
24. We originally implemented the admin console using Jinja, but decided to move it to React + Framer Motion. Claude create a plan, I reviewed and approved, … … working through the plan… one sprint at a time… develop… build… test… iterating… Claude occasionally asking questions, asking for permission to proceed, me reviewing, sometimes hands-on testing… giving feedback… asking for fixes… chugging along.
So, there it is. 100% working sample of a real application, working on your local Docker environment or hosted on AWS. 100% coded by Claude Code. Still some things we could improve on, but functional at this point.
End time: Friday, February 13th, 2026, approximately 9:30 PM
Total work time: approximately 14 hours (I watched Netflix and did other things in-between). And I did a lot of reading while Claude was working.
On Saturday, I decided to give it another swing for two more hours, and we fixed a couple of more defects, made sure the local Docker images matched what was on AWS (when I looked, it wasn't), documented things, tore down the Terraform dev infrastructure on AWS to avoid cost.
How much did this cost?
I am on the Claude Max / $100 / month plan and during my three days working on this never hit the limit. So I guess about $10 - $20, $1 for AWS, plus my time.
Things to Note / Lessons Learned
Question I have
Conclusion
I am impressed. Doing this project by myself would have taking me at least 5 weeks, and probably sleepless nights – because I am rusty at programming, and I used technology I am not familiar with. My guess is that an experienced programmer, familiar with all the technologies, could have cranked this out in a week in a week or less.
I am not sure if I am overestimating myself or underestimating the experienced programmer. But the tech stack is pretty deep, so the cognitive overload problem is real. Considering the cost differential, and the fact that as you gain experience running agent teams you will be able to work at 3 to 5 projects at a time, even with the most skilled engineer competing, it's not even close.
Check out the GitHub repo, I am curious about your feedback. How long do you think this would take?
Am I shocked? No, not based on the exponential improvements we have seen over the last 18 months. This was inevitable. What is interesting is to ponder what the next 18 months will offer up. Humans are bad at understanding exponential improvements / exponential growth, but if you look at the past couple of years as an indicator, it’s clear that we are on an incredible rocket ship, an exponential curve of improved functionality, capability, productivity that software engineering has never seen before.
Software development has been changed forever. The skill moved from mechanically assembling code, to now managing a set of agents that assemble code for you.
Management, specification, decisioning, code review, deployment – the skill sets of entire teams have been collapsed into a set of agents that are controlled by one or very few humans.
But the adage still applies: Junk in, junk out.
AI will produce amazing results if you guide it correctly. Guide it poorly, or not at all, and it will run itself off the tracks. Again, considering improvement patterns of the last 18 months, this is likely to get much better over time.
The economics of software development have changed so fundamentally that we all have a hard time wrapping our head around it. In the old days, it took many years of study and work experience to gain the knowledge which in turn justified the $200,000+ salaries. The cost of software development projects was mainly driven by labor cost for qualified engineers, which took a long time to educate and train. This fact drove the offshoring boom of the early 2000s. Now an agent costing $1,000 can do the same. International labor arbitrage has now turned into a simple tradeoff between labor cost and agent cost.
The SaaS per seat pricing model assumed humans in the seats – will that model survive when you have fewer humans and hundreds of agents? Will SaaS per seat pricing die and be replaced by a pure consumption-based model, so SaaS vendors can continue their revenue stream while mostly agents use it? And if SaaS products are mostly used by agents, do we really need fancy SaaS user interfaces? Why have an interface that mimics human, paper-based, workflows?
Finally, how will young engineers gain the experience to guide teams of agents? Have universities even begun to address the new educational requirements for Computer Science / Computer Engineering?
I sincerely hope this write-up helps people understand where we are and where we are heading.
So many questions, so little time, so few answers.
Addendum
Repo
You can find the repository, which contains the CLAUDE.md file and all source code produced by Claude below.
GitHub Repository: https://github.com/brianw1130/atm-simulator
Tech Stack Used
Backend
Frontend (ATM UI)
Frontend (Admin Dashboard)
Infrastructure
Testing
Security
Great article. thank you for writing.