I spent a few weeks telling Claude Code to handle CI failures itself. Run gh pr checks, parse the JSON, find the error, fix the code. It worked. In short sessions. The longer the session ran, the worse it got. The GitHub API returns verbose JSON. A single check run log can be tens of thousands of lines. The agent ingests all of that to find the three lines that actually matter. In a fresh context window, it manages. Three hours into a loaded development session? It starts hallucinating fixes for the wrong error or missing the signal entirely. I noticed a pattern. Each failed CI parsing attempt added more noise to the context, making the next attempt less likely to succeed. The agent wasn't getting dumber. Its context was getting noisier. So I built bellwether. It's a TypeScript CLI that reads your PR state, CI status, review comments, merge state, and returns compact filtered output. The actual compiler error, not the ten thousand lines around it. I'm not sure this is the final shape. But raw GitHub API output vs filtered signal, that's the difference between an agent that degrades over a session and one that stays sharp. #devtools #opensource #aiengineering
🔥
https://dub.sh/lu9tQU2