Intro to agentic coding using four projects
Note uno: No AI was used to write this, though nano banana was used to create that splash image. There will be grammar and spelling mistakes; it's those imperfections and lack of em-dashes that let you know it was human made.
Note dos: Also note that I am relating my experience in a general way, pulled from a mix of personal and work experimentation and use
Ready. Set. Code!
Alright! You finally persuaded your company to purchase one of the new AI services. If you are a software company, my hopes are that you are using Claude Code or Codex. If you are still using Copilot, stop reading and go back to the bargaining table!
What can we do to show the power of agentic programming and see an immediate return on investment? Give 'em the ole razzle dazzle. Razzle dazzle 'em.
If you are like me, the first thing you want to do is go for broke promising to "one shot" the next billion dollar button or whatever dream features your CEO has been discussing at every all-hands meeting since Covid. And then, as the moonshot falls back to earth, and you dust the despair off your shoulder, you realize there is a lot of value to be gained doing all the grunt work that you never wanted to do. All that tech debt. All that cleaning up! And, you're in luck! Claude does all of this work without the audacity to tell you that it's a misallocation of Claude's talents. Claude is fast, friendly, and does a pretty decent job out of the box.
So let's dive in.
Four Projects
The four projects I suggest folks use to get to know Claude (or other top-tier AI agent):
• Specs. Spec 'em all, let Claude sort 'em out.
• Refactor/rewrite all the things.
• Document. Let Claude tell you what it thinks and you clarify the institutional knowledge.
• Update runtimes, libraries, and tooling.
Specs
The long and short of it is that asking Claude to write specs is about as close to a "free lunch" as you are going to get in the land of AI. Asking the agent to create spec coverage for your application code cannot cause any bug regressions, in and of itself. The AI reads a module, class, library and attempts to spec and test the public -- and in some cases private -- interface. It can continue to iterate on the coverage until some threshold is met. It's magic.
We want to consider a basic testing pyramid: unit, integration, and end-to-end. The unit test coverage is mandatory. Integration specs are only slightly less of a priority in the context of this article, and I will not discuss end-to-end because other articles have covered those topics.
I suggest the following course of action:
This very naive approach worked well out of the box. When Claude wrote code that couldn't work it was due to unintuitive naming conventions. This required (#2) fixing assumptions using additional prompting, and documenting any quirks in our code, more on this below.
These specs are brittle and aren't accepted best practices
Yes, the specs might be brittle, especially after the first few iterations. In our Ruby codebase, Claude tested both public and private methods. Violating established community norms and testing customs, that would make me clutch my pearls in a human-written feature branch, are perfectly acceptable for Claude. We are looking to trust and verify the code as we build confidence in using Claude as an agent. I would rather have 100% test coverage and violate best practices and BDD, than accept a lower threshold. As far as the tests being brittle, this is also acceptable as long as we recognize the trade-offs in that it helps verify future changes to the code. Especially, if I am not going to be the one making the modifications!
We watch out for hallucinations and ineffective tests: testing the mock, mocking the SUT, and tautological tests. Use a standard deterministic tool (for Rails we use simplecov) to ensure a base-level of confidence in the coverage metrics. Read through the Claude generated specs to ensure the input and output is expected.
There sure are a lot of stubs and mocks!
There is a lot of comfort to be gained by using instance doubles, real objects, and persisted records. However, it's important to let Claude write units specs for the SUT, the System Under Test, and to use mocks and stubs at the boundaries. Depending on the language, Claude may go in a direction that isn't an established pattern. This is a great opportunity to update your documentation and ask Claude to try again.
When writing higher level specs, like integration and end-to-end, we move from isolation to integration. We want to test the code across boundaries and test the reality of integration points rather than the assumptions. It's important ensure contracts between objects are honored and investigate any emergent behavior when it surfaces. At this level of testing we ask Claude to reject the use of mocking and stubbing across boundaries for the SUT. I digress.
Review the test code and adjust as necessary. If you are unsure of trusting mocks and stubs, or just need some additional confidence before letting Claude cook, add directives and details to your documentation so that Claude can learn via progressive disclosure or hooks.
If you are going to use a tool like FactoryBot, or similar instance factory, see if you can build stubs or use an in-memory build method so that you don't need to create and persist the object.
Won't this lead to performance degradation?
Yes. No. Maybe, so?
In the short-term, any additional spec coverage will likely lead to an increase in the overall time it takes to run the spec and test suite. The overall time shouldn't be that much because Claude will write you specs that should not integrate with the database. Keeping all of the objects in memory, and making use of mocks and stubs, will keep performance optimized. Staying away from the persistence layer will mitigate test pollution and allow for greater parallelization.
It's hallucinating. Stop the madness!
This is all a signal. Treat Claude like a senior engineer, on the first day of work, completely agnostic to the system. When Claude attempts to stub a method that isn't there, or mock an object that doesn't exist, my first reaction is that Claude has uncovered the "curse of knowledge." I have institutional knowledge that needs to either be modified or documented for discovery. If it's not a curse of knowledge, then more likely, the code, as written, is simply not intuitive or is crufty. Lately, and moving forward, it is the case that Claude is stuck and doesn't know what to do or is on a hallucination bender again. Claude is giving me great signals and feedback; it's my job to either provide Claude with specific context (i.e. document the institutional knowledge) or rewrite the code so that it's self documenting and intuitive.
Refactor and rewrite all the things
SLOCs change, specs don't.
In the previous section, we asked Claude to create specs that provided as close to 100% unit test coverage as possible. Nothing less will do; we are testing the code as it is today. We review those specs and verify that -- yes! -- this is how we expect the code to behave. Now we move on to refactoring.
In classical refactoring fashion, we want to modify the internals without modifying the interface, the public behavior. For our purposes, this means that we will not touch the spec coverage. The specs are our contract, and given our confidence in the coverage, we tell Claude that it is free to do whatever it wants with the code.
Refactoring Patterns now, Design Patterns, maybe later.
It should make sense why our first task was to increase code test coverage. Moving forward, any change to code will have complementary tests that ensure the inputs and outputs continue to have expected results. By providing our spec, we can call this "closing the loop." While we are learning to use and have confidence in Claude, we still want a "human-in-the-loop", however we close the loop when we want to take the human out of the loop.
After writing a solid set of units and maybe a few good integration tests, it's time for Claude to move on to cleaning up that implementation! There is an old saying there is clean code and there is code that makes money. With Claude, we can have our cake and eat it.
Allow Claude to use its vast knowledge of refactoring patterns to rewrite the business logic and auxiliary code. The public interface, for our purposes, are the methods, both code invocation and HTTP requests, as well as the params that are accepted. We can head to Claude Code or the Claude chat interface to request a skill for refactoring, something like:
"Create a skill that refactors a given file. Apply DRY, SOLID, ..., and best practices for class, module, and method line lengths, method arity...the linters and make suggested adjusts, run the specs after every change and commit on green build"
My prompt is specifically for Rails and included many of my favorite Uncle Bob and Sandi Metz rules, principles, and heuristics.
Claude will provide code that is decoupled, cohesive, maintainable, and reduce the cost of future change. At least this has been my experience. And, again, since we aren't modifying the internal behavior, your specs will continue to work as expected.
As you start to extract methods and classes, there is a good chance that the code coverage may change, it shouldn't break, but you may see the percent of coverage decrease. If this occurs, consider adding additional specs. Working with Claude as a copilot is a trust-but-verified relationship, even in these yolo-Clawdbot/Moltbot times! Use static analysis and deterministic tools where possible to assess code quality and hygiene.
Is Refactoring Art or Science? Yes.
Code refactoring and rewriting is both art and science. Whether this is our first pass or we have been tweaking the code for several days, we want the result to the be same: self-documenting, intuitive code that reads like a nerdy epic, the story of an arbitrary piece of data, its modifications over space and time, its ingress, egress, or persisted state. As we write our story of data flows, we apply practical constraints like the max length of a method body, the number of params in a method signature, nested if-constraints. This is art. Code should read more like The Joy of Cooking, not James Joyce.
And on the other side we need to apply the computer science and make sure the inputs and outputs are predictable and expected, that we are making proper use of data structures and algorithms, considering space and time complexity, among other factors. This is all to say that Claude is capable of handling both. Let Claude cook.
What about Design Patterns?
Out of scope of this article is using Claude to find and apply Design Patterns. Claude has been extremely adept at refactoring code, rewriting code, and reorganizing code with clear and practical boundaries. In practice, Claude has rewritten 1000+ line God-objects into 200 line classes with 5 or 6 highly cohesive modules that logically fit into the composition of the original model.
Upon investigation of the individual refactored files, one of the upshots is that emergent and potential design patterns that were not evident in the mess of the previous code are now presented as logical next steps in the refactor. For example, while we knew that we should turn an object into a Factory, we never made the time. After Claude's cleanup, the Factory migration is straightforward, and the path to that future state is quite simple and clear.
Recommended by LinkedIn
What do I do?
When you are working with Claude, you are not giving orders, you are giving prompts. It's a request not a demand. It's a collaborative and consultative working relationship. Claude doesn't have perfect knowledge, and we are sure the code Claude is working on is probably not the best. When Claude writes code that is unexpected, it is worth investigating.
Claude is going to provide a lot of signals, and sometimes those signals will look like noise. The signal to noise ratio is going to depend on the human in the loop and the targeted code.
The base signal I receive from Claude is when it writes questionable code, hallucinates a model or method, or makes an assumption about our code and how it operates:
There are three responses:
(a) change the code so it's more intuitive
(b) provide clear documentation, especially if Claude is fumbling over institutional knowledge
(c) shrug and prompt, again
My suggestion is to respond in that order: a, b, then c. It's very easy to default to C, however, I feel there is a missed learning opportunity to all involved, human and agent alike. Heck! You may just find an edge case you never considered.
Claude is very much a brilliant software engineer, feels like it is always starting on day 1 of the job, completely agnostic to the codebase and culture. This is until Claude starts collaborating with you; Claude is a quick study.
Our job is to work with Claude and collaborate to build self-documenting intuitive code whenever possible, and document institutional knowledge when it isn't. Don't miss these early opportunities to make these changes; Claude, and those pesky humans, will work better with the code. Does a code comment not make sense? Does the method name still describe what it does? Are the class names still part of the ubiquitous language of the bounded context? If Claude is acting funny, give it the benefit of the doubt and see what can be changed.
Documentation
Oh, a software engineer's favorite task: documentation. "It's somewhere in Confluence" is the battle cry of the software engineer looking for the institutional knowledge that just went out the door with your latest team departure. But Claude loves to document.
Document All the Files
If nothing else, ask Claude to write heading comment blocks about the concerns at the top of your newly minted modules, libraries, and classes. We do not want to add comments to specific lines of code, until it's a quirk. Our desire is that Claude and the human have now rewritten code that is grokable and with complementary specs that show its inputs and outputs.
Progressive Disclosure-ish Documentation
Make sure that the agent is writing -- and you are verifying -- documents that allow the agent to process the application components and subcomponents piecemeal; and will avoid context window bloat. When creating skills and other meta AI services, having this documentation around will allow the AI to understand what, how, and where it is supposed to take action.
Ask Questions
As you read through your agent's documentation, consider it with a critical, product-driven level of attention. Does the documentation use the domain language correctly? Does the documentation explain the behavior in a human readable way or is it only giving the inputs and outputs. If the naming -- we all know this is hard -- is adequate, your agent will tell you about business use cases and explain in the domain's parlance. If the answer to these questions is "no" or if the documentation is poor, look for opportunities to modify the code or reprompt the documentation.
Your Human Coworkers Will Thank You!
As we integrate agents, LLMs, and other AI services into the stack, the dividends of proper documentation compound. The public and internal documentation will stay in sync with the code and these will all act as the accurate source of truth. The product will be able to ask hypotheticals about "what happens when a user [does this]?" without having to take the action themselves. Agentic and human implementation will have an accurate map of the code base allowing features to be seamlessly inserted, or detect and create seams where they must be created, or to rewrite and rework code when that must be undertaken.
If you have made it this far, hopefully you can appreciate that these first three projects will unlock a great deal of value, productivity gains, mystery, and excitement around the use of agents like Claude Code. The last project can be a bit more involved!
Upgrade your tools, libraries, and runtimes
If you aren't on the front end of the adoption curve, you will eventually find yourself falling off the back of the support cycle.
Even the most disciplined teams, pristine code bases, and early adopters are faced with the question of when to upgrade their environments. Should a team update early, risking premature optimization and wasting time and effort better spent adding value? Or wait until it's required, possibly having to contribute even more time, effort, and resources to upgrade from an EOL or an EOS version? With Claude, and the latest state of the art, you can let the agent do most or all of the work.
Claude will upgrade your tooling, applications, runtimes and stack. How do I know? I have seen firsthand that Claude will upgrade a Node runtime from 14 to 22. Not in isolation, there was a lot of handholding and massaging, but it was done in a single pull request, over a couple of days. And a leap of 8 major versions is nothing to shake a stick at, especially with many packages needing to be upgraded in parallel. Upgrading a single major version will be trivial by comparison.
Depending on when you are reading this, Claude may need additional context to look at Github repositories and language-specific package registries (e.g. npmjs) to see the change logs, deprecations, and upgrade paths for the libraries used. Upgrading runtimes in parallel may also require prompting. Using a runtime version manager (e.g. asdf) and library dependency manager (e.g. pip) will make these updates less of a headache, but may still require more work for the human in the loop.
When upgrading tools, it seems like Claude is very decent at migrating config properties, moving files, and otherwise, keeping the existing behavior and settings of my tools, whilst upgrading to the latest and greatest version of an application I am using.
These upgrades are a more human involved process than the preceding project, however Claude provides an incredible amount of the work. In most cases, you will continue to use the specs that you have written and the confidence they have built, to make changes, commit by commit, update by update, and upgrade by upgrade. This is done in hours not days, weeks, and quarters.
Staying on the front side of the adoption curve must be part of the team culture. It requires a great deal of discipline. And my opinion is that this is the canary in the coal mine, if the libraries are end-of-support, or worse, end-of-life versions, I am fairly certain that the codebase is about as tidy as a home on "Hoarders: Buried Alive."
Let Claude (and your favorite engineer) upgrade your tools, libraries, and runtime. The core domains needn't be on the nightly-build or point-zero release, however it's very important to stay at the front of the adoption curve, if only for security patches, performance optimizations, and new behavior. Use your spec coverage to build confidence that the application works as intended.
Now, go forth and conquer!
Bonus: Feature flags
Remove those feature flags!
We know they are in there. The feature flags that no one wants to touch because no one knows what the hell is going to break? It was a miracle that the feature worked when the flag was enabled to begin with. Six months later, the context is gone, and much has been built on and over the conditional code path. You remember something about Chesterton's Fence (https://fs.blog/chestertons-fence/), but realize, if not now, when? Someone has to do it: time to remove these feature flags, one by one.
"It's a miracle that this stuff works, at all."
If you have completed the previous projects, then you have these beautiful specs. You have well organized code. And you have documentation that you can use to verify that Claude sees the implementation working as the product owner or user might. You have verified all of these components are sound, sane, and contribute to resulting in the correct behavior.
Ask Claude to systematically remove the feature flags one at a time, with a commit for each. Cluster the flags in cohesive areas and push pull requests. When I am demanding time from engineers to review code and testers to make sure I didn't break something, the lease I can do is cluster the changes in the same area so that we can sweep through behavior in one area, without major context switching and cognitive load.
Conclusion
Claude Code and agentic programming are rapidly advancing, in breadth, depth, and capabilities, at a speed that individuals are not going to be able to keep up with. However, that should not keep individuals from diving in with both feet, learning as much as they can, and considering how individuals and teams might use AI in-part or in-whole of many tasks.
Claude, and other high performing AI agents, are the perfect tool to help clean up your working environments. Target your favorite legacy codebase and watch as it cleans up and normalizes various levels of code quality, hygiene, spec coverage, and dependency management. This alone makes the subscription worth it.
This article has provided four ideas, and a bonus, for using Claude to build confidence in agentic coding and using an AI copilot:
• Write specs and improve code coverage
• Refactor and rewrite executed code
• Document all the things
• Update runtime, tools, libraries, and other dependencies
• Remove feature flags.
After working with Claude on these projects, a codebase may not feel green-field, however it will be easier to extend, maintain, and reason about, for both human and computer, alike.
I hope you all got some benefit out of this article. If not, that is cool too!
If you are ever in the greater Hartford, Connecticut area, let me know! I am always looking to nerd out with something about tech!!
Happy Vibing!
These four tasks, and bonus, showed me the power of AI and keeping things simple. Claude was able to take a fairly messy house and clean it up, while also exposing institutional knowledge, unclear and unintentional code, and other areas where the environment could be easier to grok, maintain, and extend. As Claude takes on more of the implementation and hands-on coding, it allows myself and others to take on work higher up the value chain.
Mid to better than mid is a good target! Hope you’re well brother.