Gerald Versluis’ Post

GitHub Copilot CLI just shipped something clever: Rubber Duck. The idea is simple but powerful — when a coding agent drafts a plan, a second model from a different AI family reviews it before execution. Why a different family? Because a model reviewing its own work has the same blind spots. Same training data, same biases. A model from a different family catches different things. How it works: → You select a Claude model as your orchestrator → Rubber Duck uses GPT-5.4 as the reviewer → It activates at key checkpoints: after planning, after complex implementations, after writing tests The results are compelling: Claude Sonnet + Rubber Duck closes 74.7% of the performance gap between Sonnet and Opus alone. On the hardest problems (3+ files, 70+ steps), it scores 4.8% higher. Real examples of what it catches: • A scheduler that would start and immediately exit • A loop silently overwriting the same dict key every iteration • Three files reading from a Redis key that new code stopped writing What I like about this: it's not about replacing human review. It's about catching the confident mistakes that compound before you even see them. Available now in experimental mode with /experimental. #GitHubCopilot #AI #developer #programming

KISS. I like It. The coder doesnt test her own code.

Like
Reply
Hernanda Muhammad

.NET 10 Years+ | .NET Legacy Code Modernization | Web3 | Solidity | Smart-Contract | AI MCP | RAG | Semantic Kernel

3w

Orchestrator and sub-agents? Damn! 😮

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories