Maxim Salnikov’s Post

3w Edited

⚠️ Addictive tech warning for developers. Once you add a 🦆rubber duck to your AI agent pipeline, you’ll start feeling uncomfortable without it. This is exactly what happened to me. I no longer want to rely on a single model’s opinion for important technical decisions, and I definitely don’t want extra manual steps just to get a second perspective. That’s where “Rubber Duck”, an experimental feature in the GitHub Copilot CLI, really worked for me: - enable it with: "copilot --experimental" (Rubber Duck is the 1000th reason for you to switch to terminal-first development) - watch one LLM actively criticise another’s decisions right at the moments where it matters most, pushing towards a better solution - everything happens automatically, no extra friction, no context switching It is a targeted reviewer that steps in at high-value moments such as after drafting a plan, after a complex implementation, and after writing tests before execution. That feels like a very practical way to reduce compounding errors early, especially in long-running or multi-file tasks. So having AI challenge AI has quietly become part of how I build now. Would you trust critical technical decisions to a single model, or is multi-model critique the new baseline for serious AI-assisted development? Ready to try Rubber Duck? I warned you :) More details: https://msft.it/6044Q4Zs2 Morten Stange Bye, Haakon Hasli, Christian Tryti, Else Tefre, Francesco Manni, Jaime De Mora, Martin Woodward, Lee Stott, Christoffer Noring, Daniel Meppiel, Joel Norman, Ömür Sert, Adil I., Sebastien Le Calvez, 🥑 Aaron Powell, Nick McKenna, Burke Holland, Cornelia Bjørke-Hill #GitHubCopilot #GitHubCopilotCLI #CopilotCLI #DeveloperTools #AIAgents #CopilotRubberDuck #msftadvocate

5 Comments

Robert Constantinescu

SWE @ Keysight Technologies | Bachelor Graduate @ ACS PUB

A good starting point, but I am curious if "rubber duck" pulls a lot of the codebase into its context before making any critiques to the coding model's solution. If the project is a complicated mess (as is the case most of the time in the real world), it is very probable that it won't catch the subtle issues, and it is just going to consume more tokens (because you have less of a human-in-the-loop and more of AI arguing with AI. I believe that this is a nice idea though, but for it to work properly, agents must be able to access a complete, compact and "digestible" database of understanding for the codebase. If you know of any existing solutions that actually work, on have any ideas on this, let me know, and thanks for the tip!

2 Reactions

Ron Florax 3w

I would trust critical technical decisions to experienced senior developers, perhaps with the help of AI suggestions. AI makes so many simple, basic mistakes and opens so many loopholes and security question marks that I wouldn't let it make decisions. I would only let it suggest and trusted developers should then decide.

2 Reactions

Dr Tamara Vasey 3w

github copilot PRs are the best, it is a stickler and always throwing a smack down to the author (claude or codex) 😅

2 Reactions

Sarvesh Talele 3w

Brilliant use of the adversarial agent pattern! Having one model automatically critique another is exactly like a senior engineer reviewing a PR before it ever reaches production.

2 Reactions

Collin Mitchell 2w

It appends every prompt for me now. No going back!

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

ScaleGlide

269 followers
2w
Report this post
GitHub Adds “Rubber Duck” Review Agent to Copilot CLI GitHub has launched an experimental “Rubber Duck” mode in Copilot CLI, bringing a second AI model into the loop to review, challenge, and validate the primary agent’s work before execution. What’s interesting isn’t just the feature - it’s the pattern. 🔹 Second Opinion by Design: A separate model from a different AI family evaluates plans before they run. 🔹 Focused Review Layer: It flags missed assumptions, edge cases, and hidden risks. 🔹 Better Outcomes on Complex Tasks: Especially effective on multi-file, high-step problems where errors compound. 🔹 Agent + Reviewer Pattern: Introduces a structured “builder + critic” dynamic inside AI workflows. As agents become more autonomous, the risk isn’t that they can’t execute - it’s that they execute flawed plans too confidently. Rubber Duck introduces friction in the right place: before things break. At ScaleGlide, we see GitHub’s Rubber Duck as a clear signal that agentic development is moving from raw execution to structured validation. But as multiple agents enter the loop, the real bottleneck shifts downstream — into how feedback is prioritized, conflicts are resolved, and decisions are ultimately made. Read more: https://lnkd.in/dUwd5dms #AI #GitHubCopilot #AICoding #AgenticAI #DevTools #SoftwareEngineering #FutureOfWork #GlenFlow
Like Comment
To view or add a comment, sign in
Karan F. Modi
2w
Report this post
Your terminal just got a co-pilot. And it changes more than you think. GitHub Copilot CLI is now generally available. Natural language in your terminal. No more Googling obscure flags or copy-pasting Stack Overflow commands. But here's the part most people are skipping past: → It's not just autocomplete for commands → It explains what a command does before you run it → It's now moving into 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀 - meaning it can chain actions together → The terminal is becoming a conversation, not just an execution layer Pair this with tools like ai-agents-metrics (tracking token cost, retry pressure, outcome quality) and you start to see the bigger picture. We're not just writing code faster. We're building systems that think in steps. 𝗧𝗵𝗲 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝘄𝗵𝗼 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝘀 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 today will look like a wizard to teams still using AI as a fancy search bar. If you haven't tried Copilot CLI yet, this week is a good time to start. What's your take - is AI in the terminal a productivity leap or just another layer of abstraction we'll eventually fight with? #GitHubCopilot #AITools #DeveloperProductivity #AgenticAI #Tech
Like Comment
To view or add a comment, sign in
Jean Rodmond Junior L.
2w Edited
Report this post
I think a lot of people using coding agents (Claude Code, Copilot CLI, etc.) are underestimating how expensive and inefficient they can be by default. Not because the models are bad, but because of how we use them. Most agents today don't really optimize for token usage, context window allocation, or cost per task They willl happily pull in way more context than needed, loop through multiple reasoning steps, retry, re-read, re-generate, and produce long outputs even when unnecessary. And if u are using something like a 1M token context model, it FEELS like you have infinite room. So you stop thinking about it. But more context doesn't necessarily mean better performance. It often just means more noise, more tokens, more cost. What's interesting is that most of these tools are context-aware, but not really context-efficient. They don't ask what’s the minimal context needed? Or what's the cheapest way to solve this? Or even, do we actually need another reasoning loop here? So you end up paying for EXPLORATION every time. Which is fine (don't get me wrong) for debugging or one-off tasks. But at scale, bruuuhhh, it becomes a very different problem. Feels like we are missing a layer that treats tokens and context more like REAL RESOURCES: something to allocate, optimize, and constrain. Curious how others are thinking about this. #AI #AIAgents #AIInfrastructure #Copilot #GithubCopilot #ClaudeCode #ContextWindows
6 Comments
Like Comment
To view or add a comment, sign in
Coord

14 followers
2w
Report this post
The AI coding stack is quietly consolidating. In the last two weeks alone: Cursor rebuilt its interface around orchestrating parallel agents. OpenAI shipped an official plugin that runs inside Claude Code. GitHub launched /fleet in Copilot CLI. Every major player now assumes you're running multiple agents at once, not picking one. The industry has already answered "which agent should I use?" The answer is "all of them, depending on the job." The unanswered question is the one teams are actually stuck on: how do you coordinate a team of people running a farm of agents — across machines, repos, and credentials — without losing track of who's doing what? That's the problem Coord was built for. Your agents. Our coordination. Start building free → coord.io #AIAgents #AgenticDevelopment #DevTools
Like Comment
To view or add a comment, sign in
Akshhay Nimbalkar
2w Edited
Report this post
Okay, real talk: I thought Claude Code was just a fancier Copilot. Then I actually used it. This thing doesn't sit around waiting for instructions like an intern on their first day. It moves. Need it to dig through your codebase, run terminal commands, and edit files across your whole project at once? Done. You describe the goal; it maps the route. You're the GPS destination, not the driver. MCP servers are where your jaw drops a little. Plug in external tools, browsers, databases, and APIs, and Claude Code picks them up and uses them like it's always had them. It's not "AI plus tools bolted on." It's AI that actually has a toolbox. GitHub connectors mean it's not hiding in a tab somewhere while your real work happens elsewhere. It's in the PR. It's in the review. It's part of how the team ships, not a side quest. And then there are hooks, which honestly should be talked about way more. Imagine being able to whisper to Claude Code before it does anything: "Check this." "Always do that after," or "never touch this file." Enforce standards. Trigger tests. Build guardrails. It's your workflow, your rules. Claude Code just follows them. Four things. Tools, MCP servers, connectors, hooks. And suddenly you're not just using AI to code faster; you're using it to work smarter. There's a difference. A big one. 🙌 #ClaudeCode #Anthropic #AI #SoftwareDev #DevTools #Automation

1 Comment
Like Comment
To view or add a comment, sign in
Dr. Sören Frey
3w Edited
Report this post
/𝗳𝗹𝗲𝗲𝘁 𝗶𝘀 𝗵𝗲𝗿𝗲: 𝗧𝘂𝗿𝗻𝗶𝗻𝗴 𝗚𝗶𝘁𝗛𝘂𝗯 𝗖𝗼𝗽𝗶𝗹𝗼𝘁 𝗖𝗟𝗜 𝗶𝗻𝘁𝗼 𝗮 𝗺𝘂𝗹𝘁𝗶-𝗮𝗴𝗲𝗻𝘁 𝗽𝗼𝘄𝗲𝗿𝗵𝗼𝘂𝘀𝗲 🤖💪 The new /𝗳𝗹𝗲𝗲𝘁 command in GitHub Copilot CLI is a game-changer for multi-tasking. Instead of tackling one file at a time, /𝗳𝗹𝗲𝗲𝘁 acts as a behind-the-scenes orchestrator that plans, decomposes and executes tasks in parallel across your entire codebase. 𝗪𝗵𝘆 𝗰𝗮𝗿𝗲? - 𝘗𝘢𝘳𝘢𝘭𝘭𝘦𝘭 𝘌𝘹𝘦𝘤𝘶𝘵𝘪𝘰𝘯: It dispatches multiple sub-agents to work on different files simultaneously. - 𝘚𝘮𝘢𝘳𝘵 𝘖𝘳𝘤𝘩𝘦𝘴𝘵𝘳𝘢𝘵𝘪𝘰𝘯: It automatically identifies which tasks are independent and which have dependencies (though, direct hints help). - 𝘉𝘳𝘰𝘢𝘥 𝘚𝘤𝘰𝘱𝘦: Perfect for refactoring an entire module, updating tests and syncing documentation all in one go. 𝗛𝗼𝘄 𝗶𝘁 𝗹𝗼𝗼𝗸𝘀 𝗶𝗻 𝗮𝗰𝘁𝗶𝗼𝗻: $ 𝘤𝘰𝘱𝘪𝘭𝘰𝘵 -𝘱 "/𝘧𝘭𝘦𝘦𝘵 𝘔𝘰𝘥𝘪𝘧𝘺 𝘜𝘴𝘦𝘳 𝘴𝘤𝘩𝘦𝘮𝘢 𝘪𝘯 /𝘴𝘩𝘢𝘳𝘦𝘥 𝘵𝘰 𝘪𝘯𝘤𝘭𝘶𝘥𝘦 '𝘮𝘪𝘥𝘥𝘭𝘦𝘕𝘢𝘮𝘦' and 𝘶𝘱𝘥𝘢𝘵𝘦 𝘉𝘢𝘤𝘬𝘦𝘯𝘥, 𝘍𝘳𝘰𝘯𝘵𝘦𝘯𝘥 𝘢𝘯𝘥 𝘛𝘦𝘴𝘵𝘴" 𝗞𝗲𝘆 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀: 1. 𝘉𝘦 𝘚𝘱𝘦𝘤𝘪𝘧𝘪𝘤: The better you define the deliverables (e.g., specific file paths), the better the orchestrator can parallelize the work. 2. 𝘚𝘦𝘵 𝘉𝘰𝘶𝘯𝘥𝘢𝘳𝘪𝘦𝘴: Tell the fleet exactly which directories to touch - and which to leave alone. 3. 𝘊𝘶𝘴𝘵𝘰𝘮 𝘈𝘨𝘦𝘯𝘵𝘴: You can even use specialized agents (like a technical writer for docs) within the same fleet command. It’s like moving from being a solo developer to a 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 𝗟𝗲𝗮𝗱 in your own terminal. Have you tried parallelizing your AI workflow yet? Tell me in the comments! 👇 #AI #SoftwareEngineering #Productivity
7 Comments
Like Comment
To view or add a comment, sign in
Karthick Baskar
3w
Report this post
Why the "Rubber Duck" is the most important update to Copilot in 2026. GitHub just dropped an experimental feature that solves a massive headache for dev managers: AI hallucinations in multi-file refactors. It’s called Rubber Duck mode, and it’s a brilliant move in agentic design. Instead of one model checking its own homework (which rarely works bias in, bias out), Copilot now pairs your primary model with a "reviewer" from a completely different AI family. How it works: If you're using Claude as your primary coder, GitHub spins up GPT-5.4 as the "Rubber Duck" to critique the plan before a single line of code is written. The result? Early benchmarks show it closes nearly 75% of the performance gap on complex, 70+ step tasks. It’s catching the silent logic errors that usually don't surface until a production bug report hits your desk. In my view, 2026 isn't about which LLM is "smarter." It’s about which multi-agent architecture provides the highest guardrails for our teams. #GitHubCopilot #AI #SoftwareEngineering #AgenticAI #GenAI
Like Comment
To view or add a comment, sign in
Augustin Rajkumar S
3w Edited
Report this post
🚀 Mastering Claude Code requires more than just running commands. Yet most developers use it incorrectly. Here’s what actually matters 👇 📋 Claude Code Slash Commands: The Dev's Secret Weapon These commands don’t just run actions — they control how the AI thinks, costs, and executes workflows. ⚙️ Planning & Reasoning • /plan [description] or /plan mode → Creates step-by-step execution plans before touching code (~500–2k tokens) • /effort [low|medium|high|max|auto] → Controls reasoning depth (higher = better analysis + higher cost) • /think → Enables structured, step-by-step reasoning • /ultrathink → Maximum reasoning for complex debugging & architecture 💰 Usage & Cost Tracking • /cost → Session token usage + estimated cost • /usage → API limits, quotas, rate limits • /extra-usage → Enables overflow usage (extra cost) • /stats → Usage charts: daily trends, streaks, model prefs 📊 Context & Efficiency • /context → Shows context window usage (critical for long sessions) • /btw <question> → Ask side questions without polluting history • /buddy → Virtual pet/helper (/buddy pet, /buddy card, /buddy hatch) • /sandbox → Safe, isolated execution environment ⚡ Advanced & Integrations • !command → Run terminal commands directly (!git status) • LSP (Language Server Protocol) → IDE-like intelligence (autocomplete, diagnostics) • Bonus: /compact, /clear, /doctor → cleanup & diagnostics 🎯 Key Insight: Most commands are lightweight… But /plan + high /effort directly increase future token cost 👉 The real power = combining ( /plan + /effort + /context ) 🔥 If you're building with AI daily, mastering these is a game changer. 👇 Which command do you use the most? #ClaudeCode #AIEngineering #DeveloperTools #Productivity #LLM #BackendDev #Anthropic #AICoding
1 Comment
Like Comment
To view or add a comment, sign in
Gregg Cochran
2w Edited
Report this post
🏭 I built an agentic dark factory for AI building. Imagine telling AI agents what you want to build, turning the lights off, and the agentic factory gets to work. That future isn’t hypothetical anymore. Dark Factory is an experiment in spec‑to‑software automation, using the GitHub Copilot CLI. (repo and website link in comments) At the core: Six specialist agents, each with its own prompt, model assignment, and governance rules. They’re stateless and only see what the Factory Manager explicitly passes forward. Here’s how it works: 1. You give Dark Factory a short, natural‑language goal in the GitHub Copilot CLI. 2. The system spins up an isolated, disposable git worktree so every build is clean and contained. 3. Specialist agents (Product, Architecture, Build, QA) move through a checkpoint‑gated pipeline. 4. Each phase must pass before the next begins. 5. A sealed acceptance test suite is generated from the spec before any code is written. 6. The building agents never see these tests, which prevents “teaching to the test.” 7. The output is a review‑ready pull request. This project is about exploring what AI systems can do when agnet orchestration, verification, and governance are designed intentionally. Have an idea you’re curious to see tested? Post it below. I’ll choose one, run it through the Dark Factory, and share the outcome. #AI #GitHub #Copilot #CopilotCLI
11 Comments
Like Comment
To view or add a comment, sign in
Christian Yemele
3w
Report this post
The dev team that works while you sleep just took its first real step. We've been building AgentFlow an autonomous software development team of AI agents that coordinate through a shared store and work a GitHub ticket backlog end to end. No hardcoded pipelines. No spaghetti. Just agents talking to each other through typed actions and shared state, organised under an open-source project called Sprintless. This week we hit the milestone that matters most: NEXUS, FORGE, and SENTINEL are working together in the harness. NEXUS reads the ticket board and assigns work. FORGE spins up a Claude Code instance and begins building in an isolated Git worktree. SENTINEL gets spawned fresh for every evaluation segment no history with FORGE, no accumulated leniency, no context drift. It evaluates the work against concrete criteria and sends specific, line-level feedback. FORGE iterates. Quality comes from the tension between them. This is the GAN pattern applied to software development. The generator doesn’t improve from being told to be better. It improves from iterating against a demanding, adversarial evaluator that cannot be charmed by adequate work. Some of the engineering decisions we’re most proud of: → Git worktrees for multi-pair isolation N pairs work in parallel, each on their own branch, with Redis-based dynamic file locking to prevent conflicts even when ticket scope expands mid-task → inotify-driven harness when FORGE updates the WORKLOG, the harness spawns a fresh SENTINEL within milliseconds. No polling delay. → Claude hooks as infrastructure FORGE literally cannot exit without a valid artifact. Not a prompt suggestion. A shell script that blocks the exit. → Context resets via PreCompact the hook intercepts compaction, writes a structured HANDOFF.md, and a fresh instance resumes from the exact next step. Long tasks stay coherent for hours. The repo is live: https://lnkd.in/eD7_RudY We’re building this in public. The full system goes public at end of Phase 2 and we’re looking for contributors now. If you’re working on multi-agent systems, agentic harness design, or autonomous coding infrastructure we’d love to connect. There’s meaningful open work at every level of the stack. Stack: Rust · Tokio · PocketFlow · Claude Code · LiteLLM · Redis · Docker #AIEngineering #Rust #AgenticAI #OpenSource #SoftwareEngineering

GitHub - The-AgenticFlow/AgentFlow: Autonomous AI Dev Team github.com
Like Comment
To view or add a comment, sign in

32,015 followers

View Profile Connect

Maxim Salnikov’s Post

More from this author

Are you one of the Azure Heroes?

What is the Mobile Era conference?

Mobile Era 2016

Explore content categories

Maxim Salnikov’s Post

More Relevant Posts

More from this author

Are you one of the Azure Heroes?

What is the Mobile Era conference?

Mobile Era 2016

Explore related topics

Explore content categories