The Problems Script: How to Get Clean Code from AI Agents

The Problems Script: How to Get Clean Code from AI Agents

This covers the most important pattern for making agents generate quality code.


Problem

Your AI assistant just submitted a PR with 47 type errors and broken tests. Sound familiar?

While Opus 4.5 has changed the game, this situation still occurs. You can update your commands, build new skills, rewrite the Claude.md file, or add rules until the cows come home. Unfortunately, if you aren't doing one important thing, it will continue to happen.

Solution

Run the tests and enforce coverage. However, leaving the agent to do this with existing tooling can be problematic. Existing tooling like jest and other testing frameworks can be noisy and make it difficult for the agent to zero in on the issue before its context is overloaded.

I found a simple solution: Build your own "problems" script. It's a script that executes your existing tooling to surface all problems in an efficient manner for the agent. The same code always produces the same feedback, so the agent can trust that fixing the reported issues will move it forward.

The Problems Script

The problems script should:

  • Surface All Problems by linting, building, and testing (with code coverage). Ex. "yarn problems".
  • Write Problems to a File by channeling only the problem details found in the tooling output to a problems.txt file.
  • Provide Minimal Terminal Output by only printing to the terminal the progress, overall result (problems or no problems), and the path to the problems file generated.
  • Make It Scopable by having it take a folder path argument that will scope the test execution and coverage analysis to only that folder.
  • Make It Live by having it output as it executes to both the terminal and the problems.txt file. This lets you watch the problems file while it executes.
  • Make It Fast so feedback takes seconds, not minutes. If feedback takes too long, the AI (and you) lose context.
  • Enforce 100% Code Coverage but not by writing tests to cover everything — by writing tests AND using code commenting for exclusions WITH reasons provided in the code. 100% code coverage ensures the agent only writes the code that is needed.

Usage

Once you have this script, make it the last thing your agent is required to do before completing a task, phase, or submitting a PR. Teach it to start first by running the command where it made changes and then having it back out to run all before considering the work done.

For example, considering changes made to a task list component, start with yarn problems src/features/task-list. Fix the problems found there and then run yarn problems . to make sure you haven't broken anything else.

Implementation

I don't have a canned problems script for you. Every project is different. You should work with your agent to create one.

Give your agent this article and ask the agent to create one for you.

Example Terminal Output

The terminal shows only what step is running and whether it passed:

Running problems for packages/core
Linting... completed in 2.81s
Building... completed in 1.41s
Testing... completed in 25.19s
✓ All checks passed in 29.42s
TEST PASS: PASSED (Tests - Total: 1144, Passed: 1143, Failed: 0, Skipped: 1, Files Covered: 168, Files With Gaps: 0)
        

When something fails, it's equally clear:

Running problems for apps/better
Linting... Linting completed in 1.17s
Building... Build problems found. See problems.txt for details.
Please fix the issues and then rerun "yarn problems apps/better".
        

No guessing. No scrolling through noise. The AI knows immediately: build failed, problems are in this file.

Example Problems.txt File Content

All problems (and only problems) get written to a problems.txt file:

================================================================================
PROBLEMS ANALYSIS REPORT
Target Directory: apps/better
================================================================================

LINT RESULTS
Command: yarn lint --fix
No problems (1.17s)

BUILD RESULTS
Command: yarn build-test
Command failed: yarn build-test

../../packages/roles-ui/src/EditRole.tsx(89,16): error TS2741: Property 'label' 
is missing in type '{ value: string | undefined; }' but required in type 'IconPickerProps'.

../../packages/roles-ui/src/EditRole.tsx(92,39): error TS2322: Type '{ children: 
Element; title: string; flex: number; }' is not assignable to type 'FormSectionProps'. 
Property 'flex' does not exist on type 'FormSectionProps'.

ABORTED: Build failed. Stopping before tests.
        

This structure lets the AI:

  • See which phase failed
  • Get exact file paths and line numbers
  • Understand the error types
  • Plan remediation systematically

Code Coverage

Yes, 100% sounds dogmatic. Here's why it works for AI workflows specifically.

The problem with partial coverage targets (say, 80%) is that they require judgment about which 20% to skip. AI agents are bad at that judgment — they'll skip whatever's convenient, which is often the complex error-handling paths where bugs hide.

100% coverage removes the ambiguity. The rules are simple:

  1. All code must be covered by tests, OR
  2. Explicitly excluded with // istanbul ignore next AND a reason

// istanbul ignore next - defensive error handling, would require mocking internal failure
if (!response.ok) {
  throw new Error('Unexpected failure');
}
        

The AI learns to either write tests or justify exclusions. Both outcomes are good — you get coverage or documentation of why coverage isn't practical. And when you review the PR, you can quickly scan the istanbul ignore comments to verify the AI's judgment.

The Meta-Pattern

This approach works because it treats the AI as a collaborator that needs good tooling — the same way a human developer does.

You wouldn't ask a new team member to debug issues by reading raw CI logs. You'd give them clear error messages, point them to the right files, and let them focus on fixing rather than finding.

AI deserves the same consideration. Build the infrastructure that makes success easy, and you'll get PRs worth reviewing instead of messes worth discarding.

Try It Yourself

Point your agent at this article and ask it to build you a problems script. Execute and iterate. Have fun!

Stewart Armbrecht : then you find out BMAD method….

Like
Reply

Love this Stewart Armbrecht ! Bringing structure and consistency to how AI tools improve SDLC efficiency is a game-changer

Like
Reply

Keeping the AI agents focused not just on the task but also to make sure it doesn't break stuff has always been a challenge. A nice, practical and neat approach. Can't wait to try it.

Like
Reply

To view or add a comment, sign in

More articles by Stewart Armbrecht

Others also viewed

Explore content categories