The Problems Script: How to Get Clean Code from AI Agents
This covers the most important pattern for making agents generate quality code.
Problem
Your AI assistant just submitted a PR with 47 type errors and broken tests. Sound familiar?
While Opus 4.5 has changed the game, this situation still occurs. You can update your commands, build new skills, rewrite the Claude.md file, or add rules until the cows come home. Unfortunately, if you aren't doing one important thing, it will continue to happen.
Solution
Run the tests and enforce coverage. However, leaving the agent to do this with existing tooling can be problematic. Existing tooling like jest and other testing frameworks can be noisy and make it difficult for the agent to zero in on the issue before its context is overloaded.
I found a simple solution: Build your own "problems" script. It's a script that executes your existing tooling to surface all problems in an efficient manner for the agent. The same code always produces the same feedback, so the agent can trust that fixing the reported issues will move it forward.
The Problems Script
The problems script should:
Usage
Once you have this script, make it the last thing your agent is required to do before completing a task, phase, or submitting a PR. Teach it to start first by running the command where it made changes and then having it back out to run all before considering the work done.
For example, considering changes made to a task list component, start with yarn problems src/features/task-list. Fix the problems found there and then run yarn problems . to make sure you haven't broken anything else.
Implementation
I don't have a canned problems script for you. Every project is different. You should work with your agent to create one.
Give your agent this article and ask the agent to create one for you.
Example Terminal Output
The terminal shows only what step is running and whether it passed:
Running problems for packages/core
Linting... completed in 2.81s
Building... completed in 1.41s
Testing... completed in 25.19s
✓ All checks passed in 29.42s
TEST PASS: PASSED (Tests - Total: 1144, Passed: 1143, Failed: 0, Skipped: 1, Files Covered: 168, Files With Gaps: 0)
When something fails, it's equally clear:
Recommended by LinkedIn
Running problems for apps/better
Linting... Linting completed in 1.17s
Building... Build problems found. See problems.txt for details.
Please fix the issues and then rerun "yarn problems apps/better".
No guessing. No scrolling through noise. The AI knows immediately: build failed, problems are in this file.
Example Problems.txt File Content
All problems (and only problems) get written to a problems.txt file:
================================================================================
PROBLEMS ANALYSIS REPORT
Target Directory: apps/better
================================================================================
LINT RESULTS
Command: yarn lint --fix
No problems (1.17s)
BUILD RESULTS
Command: yarn build-test
Command failed: yarn build-test
../../packages/roles-ui/src/EditRole.tsx(89,16): error TS2741: Property 'label'
is missing in type '{ value: string | undefined; }' but required in type 'IconPickerProps'.
../../packages/roles-ui/src/EditRole.tsx(92,39): error TS2322: Type '{ children:
Element; title: string; flex: number; }' is not assignable to type 'FormSectionProps'.
Property 'flex' does not exist on type 'FormSectionProps'.
ABORTED: Build failed. Stopping before tests.
This structure lets the AI:
Code Coverage
Yes, 100% sounds dogmatic. Here's why it works for AI workflows specifically.
The problem with partial coverage targets (say, 80%) is that they require judgment about which 20% to skip. AI agents are bad at that judgment — they'll skip whatever's convenient, which is often the complex error-handling paths where bugs hide.
100% coverage removes the ambiguity. The rules are simple:
// istanbul ignore next - defensive error handling, would require mocking internal failure
if (!response.ok) {
throw new Error('Unexpected failure');
}
The AI learns to either write tests or justify exclusions. Both outcomes are good — you get coverage or documentation of why coverage isn't practical. And when you review the PR, you can quickly scan the istanbul ignore comments to verify the AI's judgment.
The Meta-Pattern
This approach works because it treats the AI as a collaborator that needs good tooling — the same way a human developer does.
You wouldn't ask a new team member to debug issues by reading raw CI logs. You'd give them clear error messages, point them to the right files, and let them focus on fixing rather than finding.
AI deserves the same consideration. Build the infrastructure that makes success easy, and you'll get PRs worth reviewing instead of messes worth discarding.
Try It Yourself
Point your agent at this article and ask it to build you a problems script. Execute and iterate. Have fun!
Stewart Armbrecht : then you find out BMAD method….
Love this Stewart Armbrecht ! Bringing structure and consistency to how AI tools improve SDLC efficiency is a game-changer
Keeping the AI agents focused not just on the task but also to make sure it doesn't break stuff has always been a challenge. A nice, practical and neat approach. Can't wait to try it.