GitHub Copilot CLI just shipped something clever: Rubber Duck. The idea is simple but powerful — when a coding agent drafts a plan, a second model from a different AI family reviews it before execution. Why a different family? Because a model reviewing its own work has the same blind spots. Same training data, same biases. A model from a different family catches different things. How it works: → You select a Claude model as your orchestrator → Rubber Duck uses GPT-5.4 as the reviewer → It activates at key checkpoints: after planning, after complex implementations, after writing tests The results are compelling: Claude Sonnet + Rubber Duck closes 74.7% of the performance gap between Sonnet and Opus alone. On the hardest problems (3+ files, 70+ steps), it scores 4.8% higher. Real examples of what it catches: • A scheduler that would start and immediately exit • A loop silently overwriting the same dict key every iteration • Three files reading from a Redis key that new code stopped writing What I like about this: it's not about replacing human review. It's about catching the confident mistakes that compound before you even see them. Available now in experimental mode with /experimental. #GitHubCopilot #AI #developer #programming
KISS. I like It. The coder doesnt test her own code.
Orchestrator and sub-agents? Damn! 😮
Full blog post: https://github.blog/ai-and-ml/github-copilot/github-copilot-cli-combines-model-families-for-a-second-opinion/