Why I Use Two AI Coding Tools Instead of One

Why I Use Two AI Coding Tools Instead of One

This is a long read. If you're looking for quick tips, this isn't it. But if you're serious about improving how you work with AI coding tools, this is the process I've refined over the past year.

Have you ever written an email, proofread it twice, and still sent it with a typo? That's what happens when AI writes code and then checks its own work. It's too close to what it just created to catch its own mistakes.

About a year ago, I started doing something simple: I use one AI tool to write code and a completely different one to check it. Think of it like having a writer and an editor. The writer creates, the editor reviews. They're good at different things, and the final product is better because both contributed.

I've tried a bunch of different tool combinations (Cursor, Claude Code, Augment Code, Codex, Qodo) and they all produce the same result: fewer bugs, faster work, less time spent fixing things later. But the biggest win has been catching edge cases during development that I'd normally discover much later, during UAT or after users start hitting weird scenarios in production. That alone has been worth the extra step.

The Blind Spot

Today's AI coding tools are powerful. They read your project files, understand your codebase, and respond to plain English instructions. But they have a blind spot. When the AI tells you "Done! I've added the login form," how do you know it actually did everything correctly? 

That's where the second tool comes in.

The Basic Idea

You give each AI tool a specific job:

The Builder writes code, implements features, and fixes bugs. This is the tool you give instructions to when you want something created or changed.

The Checker reads the code and tells you whether it's actually correct. It never writes code itself. Its only job is to look at what the Builder did and give you an honest assessment.

Because the Checker didn't write the code, it has no reason to defend it. It approaches everything with fresh eyes.

Right now, my default setup is Claude Code as the builder and Cursor as the checker. Claude Code creates the plans, writes the code, and implements features. Cursor reviews the plans with PC: and verifies the finished work with WV:. But any two tools work. The pattern matters more than the specific tools you pick. For very large builds, I sometimes have Cursor handle the implementation instead, since I find it a little faster for churning through big features. The core rule stays the same: whoever builds doesn't check.

One thing this workflow assumes: your Builder shows you a plan before it starts writing code. Most AI coding tools support this (Claude Code calls it "plan mode," Cursor has similar features), and if you're not already using it, start there. Getting the AI to explain what it's going to do before it does it will improve your results dramatically, even without a second tool checking the work.

How I Talk to the Checker (Two Key Prefixes)

I created short codes that I type at the beginning of a message to tell my Checker AI what kind of review I need. They're just letters followed by a colon, typed before your actual message.

The workflow is dead simple: copy the Builder's response, paste it into the Checker, and add the prefix at the beginning. You don't have to rephrase anything or write a custom request. The prefix tells the Checker what to do with whatever you pasted.

But the prefix alone isn't doing the heavy lifting. You could just paste the Builder's output and say "check this," and the AI would try. But I've found that the more detailed your instructions are, the better the results. AI tools respond to context and clear expectations. So behind each prefix is a set of detailed instructions that tell the Checker exactly how to conduct the review: what to check, what order to check it in, how to rate its confidence, what counts as a blocker vs. a warning. When you type PC:, the Checker isn't just winging it. It's following a protocol. That's why the output comes back structured and consistent instead of a vague "looks good to me."

If you've used slash commands in these tools, the prefixes work the same way. They're powered by instruction files that your AI tools already know to look for. More on setting those up in the Getting Started section below.

The examples below are simplified to show the idea. In practice, the Builder's responses are longer and more detailed, which gives the Checker more to work with when verifying.

PC: "Is this plan safe to execute?"

PC stands for Plan Checker. I use it before my Builder starts working, when it's proposed a multi-step plan. This is the first prefix I reach for because it comes first in the process: check the plan before any code gets written.

Say the Builder proposes a plan. I copy it, paste it into the Checker, and add PC: at the front:

Me to the Checker: "PC: To add user profiles, I'll: (1) Create a new profiles table in the database, (2) Add a /profile API endpoint in routes/api.js, (3) Build a profile page component at components/ProfilePage.jsx"

Checker: "Two issues. First, routes/api.js doesn't exist in your project. Your API routes are in server/routes/index.js. Second, there's no step for authentication checks. Anyone could view or edit anyone else's profile."

This has saved me from executing plans that reference files that don't exist, assume database fields that were renamed weeks ago, and miss important steps.

WV: "Did this actually get done right?"

WV stands for Work Verifier. I use it after my Builder says it's finished something.

Builder: "Done! I've added email validation to the signup form. Invalid emails now show an error message and the form won't submit."

I copy that, paste it into the Checker, and add WV::

Checker: "The validation is partially implemented. It checks for an @ symbol, but not for a valid domain (so 'user@' would pass). Also, the form still submits even when validation fails because the submit handler doesn't check the validation state. Confidence: 40%. Two issues need fixing."

The Builder said "Done!" but the Checker found two real problems I would have shipped otherwise.

So what do you do with the Checker's findings? You paste them back into the Builder and ask "Is this valid?" More often than not, the Builder agrees and fixes the issues. Then you run it through the Checker again. You keep going back and forth until either the Checker says "all good" or the Builder pushes back and says the Checker is nitpicking. That back and forth is the whole point. It's not a one-shot review. It's a cycle.

Other Prefixes

WV and PC are the two I use constantly. But the pattern is flexible. I've built other prefixes on the same model for troubleshooting bugs (where the Checker diagnoses instead of the Builder, since the Builder tends to assume the problem is related to whatever it recently changed) and for reviewing my own prompts before sending them to the Builder (especially helpful when you're newer to AI coding and aren't sure you've given enough context). Same idea: a short code at the start of the message, and the Checker knows what mode to switch into.

A Real Example: Both Prefixes in Action

The examples above are simplified. Here's what this actually looks like on a real bug.

I was building a sports league management app, and during testing we kept seeing "Invalid invitation code" after verifying their email. The Builder (Claude Code) traced it to a URL parameter collision: the page was grabbing the 6-digit email verification code and passing it to the next page as if it were an invitation code.

The short version: The Builder proposed a fix covering 2 locations in the code. I ran it through the Checker with PC: and it found the fix was incomplete: same bug existed in 2 more locations, a variable wasn't properly set up, and both files were protected and the plan didn't mention unlocking them. The Builder revised the plan, the Checker approved the second pass. After the Builder implemented it, I verified with WV: and the Checker confirmed everything landed correctly. Score: 9 out of 10.

Without the plan check, I would have shipped a fix that only worked for half the user paths. Without the work verification, I would have trusted the Builder's "Done!" without confirming the edge cases actually landed. That's the whole workflow in one bug fix: Plan, Check, Revise, Build, Verify.

Below are truncated versions of the actual exchanges if you want to see what the Checker's output looks like in practice.

I pasted the Builder's plan into the Checker with PC::

Me to the Checker: "PC: The fix (2 small changes): VerifyEmail.tsx: check if the code is actually an invitation code before forwarding. Two spots: lines 198 and 206. CreateOrganizationEntity.tsx: if a 6-digit numeric code shows up, skip invitation validation and strip it from the URL."

Checker: "BLOCKER (completeness): The plan says two spots in VerifyEmail.tsx; the same pattern exists at lines 348-349 and 358-365. Without fixing those, a user who verifies via the form can hit the same bad redirect. ... BLOCKER (process): Both target files are Tier 1 locked — plan must include unlock step or saves will fail. ... WARNING: Phase 2 must wire setSearchParams and clone params before mutating. ... VERDICT: Revise."

The Builder agreed on all three, revised the plan, and the Checker's second PC: pass approved it. After the Builder implemented the fix, I verified with WV::

Me to the Checker: "WV: Done. Both files changed, TypeScript compiles clean. VerifyEmail.tsx — Added getInvitationCodeFromUrl() helper. Replaced all 4 redirect sites. CreateOrganizationEntity.tsx — Added guard to strip 6-digit codes and skip validation."

Checker: "PASS 1 (Existence): Confirmed. Helper defined at lines 48-52, used in all four redirect branches. ... PASS 2 (Correctness): Behavior matches the stated goal. Six-digit codes are no longer passed through. ... Confidence: HIGH. SCORE: 9/10. ACTION: None — treat the fix as verified and safe."

The Bigger Picture: A Complete Workflow

The prefixes are really just part of a larger process that I've refined over the past year. When I follow all the steps, things go smoothly. When I skip steps, that's when I run into problems.

There are really two versions of this workflow: a quick one for most tasks and a full one for anything substantial.

The Quick Version

For bug fixes, small features, and changes where the requirements are obvious:

Plan → Check Plan → Build → Verify

Have your Builder propose a plan. Run it through the Checker with PC:. Build it. Verify the result with WV:. Done.

The Full Version

For bigger features, anything touching security or payments, or work where "what it should do" isn't immediately obvious, I add a requirements step in the middle:

Step 1: Write your requirements. Before any code gets written, describe what you want in plain English. What should it do? What shouldn't it do? What happens in unusual situations? You can include user stories if that's how your team thinks. Have both AIs review the requirements and agree on them. This becomes your "contract" for the project.

This requirements doc does more than guide the build. It can feed directly into user stories in your project management tool, and later you can use it as the source of truth for a help chatbot that answers questions about how the application works. Write it once, use it three ways.

It also becomes essential when you come back to a complex application months later. If you built a feature six months ago, a lot may have changed since then. The requirements doc helps you and the AI realign on what the application is supposed to do, see how it's evolved, and figure out whether you're looking at a small update or a larger systemic change to the codebase or the requirements themselves. Without it, you're both guessing.

Step 2: Plan the work. Have your Builder create a step-by-step plan. Run it through the Checker with PC:. Don't start building until both agree the plan is solid.

Step 3: Build and verify. The Builder works through the plan one step at a time. After each step, use WV: to confirm it was done correctly before moving on.

Step 4: Compare the result to your requirements. When everything's built, go back to the requirements from Step 1. Does the finished product actually match what you wrote? Not just "does it run without errors" but "does it do what we agreed it should do."

Here's why that requirements step matters. Say you're building a password reset feature. Your Builder proposes a plan: generate a reset token, email it to the user, let them set a new password. Sounds reasonable. But when you run the plan through the Checker, it flags that the plan doesn't mention what happens when someone requests a reset for an email that doesn't exist. That's an information disclosure risk. So you revise the plan.

Then when you write up the requirements in Step 1, the Checker reviews those too and catches that you haven't specified what happens if the user's account is locked. Another edge case you would have missed.

By the time you actually start building, both tools have stress-tested the plan and the requirements. The code comes out better because the thinking that went into it was better.

Most people skip Steps 1 and 4. They jump straight into planning and coding, then wonder why the result doesn't match what they had in their head. Writing requirements first forces you to think clearly, and checking against them at the end closes the loop. And for complicated feature builds, it's where most of those edge cases surface, the ones that would otherwise show up as bug reports weeks later.

When to Use This (And When Not To)

You don't need the full version for everything. A quick bug fix or a small text change doesn't need a requirements step. Save the full workflow for situations where mistakes are costly:

  • Changes that touch multiple files
  • Anything involving user data, payments, or security
  • Features that are complex enough to need a multi-step plan
  • Situations where you're not sure the AI understood what you wanted

Getting Started

Here's the minimum you need:

Pick two AI tools that can read your project files. Some common combinations: Claude Code + Cursor, Cursor + GitHub Copilot, Claude Code + Codex. Most have free tiers or trials, so you can experiment without spending money upfront.

Decide which one builds and which one checks. Either tool can fill either role. Try both arrangements and see what feels right.

Set up your prefix instructions. The PC: and WV: prefixes won't work out of the box. You need to create a markdown file that tells the Checker AI how to behave when it sees each prefix: what to look for, what format to report in, what counts as a blocker vs. a warning. This file lives in your project repo so the AI loads it automatically. Claude Code reads from CLAUDE.md and skill files, Cursor reads from its rules file, and other tools have their own equivalents. This is the part that makes the whole system work. I've built out a complete set of prefix instruction files that I use daily. If you want a copy to start from, DM me and I'll send them over.

Try it on your next feature. Before the Builder starts, paste its plan into the Checker with PC:. After it finishes, paste its summary into the Checker with WV:. That's it. You'll see the value immediately.

A Note on Sub-Agents

If you're already using tools like Claude Code or Cursor's background agents, you may have noticed they're starting to do some of this automatically. These tools can spin up sub-agents that verify their own work before reporting back to you. That's basically the same pattern described here, just built into the tool itself.

That doesn't make the manual approach irrelevant. Sub-agents still operate within the same model family, so you're not getting the cross-provider blind spot coverage you get from using two different tools. And if you're newer to AI coding, doing this manually teaches you the habit of verifying AI work before you start trusting automated checks happening behind the scenes. But it's worth knowing that the tooling is catching up to this idea.

What I've Learned After a Year

The process works. When things go wrong, it's almost always because I skipped a step, not because the process failed. The most common mistake is skipping the requirements (Step 1) and the final alignment check (Step 4).

You don't have to be a developer to use this. If you're a product manager, a data analyst, or a founder using AI to build software and you can't easily spot-check the code yourself, having a second AI do it for you is the next best thing.

The deeper lesson is that AI coding tools are only as good as the process you wrap around them. The tools keep getting better, but the fundamental problem of verifying AI output doesn't go away just because the models improve. Having a second set of eyes, even artificial ones, changes the kind of work you're willing to trust.

Interesting setup, as it makes a ton of sense. Curious how you’re thinking about cost, though. I’ve seen a bunch of folks flag how fast token usage climbs once you’re running multiple agents. Have you found a good way to keep that in check while still getting the quality boost?

Like
Reply

I going give this to my guys. So many valuable points.

The awareness-vs-adoption gap is the most telling metric here. From what I see in TS/Node teams: everyone knows about Copilot, but actual daily usage drops fast once you move beyond autocomplete. The real shift happens when devs move from "AI suggests code" to "AI operates within architectural constraints" — tools like Claude Code with CLAUDE.md files or Cursor with .cursorrules. That's where adoption sticks, because the tool actually understands your codebase conventions. Would love to see JetBrains track "configured AI tools" vs "default AI tools" — I bet the retention difference is massive.

Like
Reply

To view or add a comment, sign in

More articles by John Minze

Others also viewed

Explore content categories