Gerald Versluis’ Post

GitHub Copilot CLI just shipped something clever: Rubber Duck. The idea is simple but powerful — when a coding agent drafts a plan, a second model from a different AI family reviews it before execution. Why a different family? Because a model reviewing its own work has the same blind spots. Same training data, same biases. A model from a different family catches different things. How it works: → You select a Claude model as your orchestrator → Rubber Duck uses GPT-5.4 as the reviewer → It activates at key checkpoints: after planning, after complex implementations, after writing tests The results are compelling: Claude Sonnet + Rubber Duck closes 74.7% of the performance gap between Sonnet and Opus alone. On the hardest problems (3+ files, 70+ steps), it scores 4.8% higher. Real examples of what it catches: • A scheduler that would start and immediately exit • A loop silently overwriting the same dict key every iteration • Three files reading from a Redis key that new code stopped writing What I like about this: it's not about replacing human review. It's about catching the confident mistakes that compound before you even see them. Available now in experimental mode with /experimental. #GitHubCopilot #AI #developer #programming

4 Comments

Gerald Versluis 3w

Full blog post: https://github.blog/ai-and-ml/github-copilot/github-copilot-cli-combines-model-families-for-a-second-opinion/

1 Reaction

Javier Martin Gil 3w

KISS. I like It. The coder doesnt test her own code.

Hernanda Muhammad

Orchestrator and sub-agents? Damn! 😮

See more comments

To view or add a comment, sign in

More Relevant Posts

Usama Javed
1w
Report this post
GitHub just turned agent skills into npm — one `gh skill install` now runs on 6 different AI coding agents. Shipped April 16 in CLI v2.90.0. Copilot, Claude Code, Cursor, Codex, Gemini CLI, Antigravity. Same command. Same skill. Zero vendor lock. Spent the weekend rewiring my workflow around it. The skill I built for Claude Code last month installed into Cursor in 12 seconds. No rewrite. No adapter. Just a commit SHA and a `gh skill add`. VoltAgent's awesome-agent-skills repo already curates 1,000+ skills. K-Dense-AI dropped a full science/finance/research pack the same week it launched. The ecosystem moved faster than the announcement. Here's what most engineers are missing: The agent isn't the moat anymore. The skill library is. Whoever owns the best skills wins — not whoever owns the best model. Copilot, Claude, Cursor all become interchangeable shells the moment skills go portable. But read this twice before you install anything: GitHub does zero verification. No signatures. No review. No sandbox by default. Your AI is now executing arbitrary instructions from random GitHub repos. A skill can contain prompt injections, hidden system prompts, or shell commands. `gh skill preview` is your only line of defense — and almost nobody will run it. We just recreated the npm supply chain problem for AI agents, except this time the malicious payload tells your model what to think. Pin to commit SHAs. Preview before install. Treat every skill like untrusted code — because it is. The agent wars just ended. The skill wars just started. #AI #GitHub #DevTools #AIAgents #SupplyChainSecurity
Like Comment
To view or add a comment, sign in
Florin Lungu

Lead DevOps Engineer | Vice President (VP) @ Deutsche Bank
2w
Report this post
The article explores how Rubber Duck offers an alternative viewpoint alongside GitHub Copilot CLI. I found it interesting that having multiple AI perspectives can enhance our problem-solving capabilities, especially in coding. What strategies do you use to ensure diverse inputs in your development process?

GitHub Copilot CLI combines model families for a second opinion https://github.blog
Like Comment
To view or add a comment, sign in
Ajay Singh
2w
Report this post
In 2026, tokens are the new developer currency. Stop spending yours on "Orientation Tax." I built a VS Code extension that cuts GitHub Copilot's token usage by 40–95%. It's now live on the Marketplace. Here's the problem: Every time Copilot needs to understand your code, it reads entire files — thousands of tokens just to figure out where a function lives. That's like reading an entire book to find one chapter title. 𝗧𝗼𝗸𝗲𝗻𝗦𝗹𝗮𝘆𝗲𝗿 fixes this by giving Copilot compact structural skeletons instead of raw files: → Before: 1,200 lines of raw code = 5,000 tokens → After: 8-line structural skeleton = 200 tokens 𝗛𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀: ⚡ Single LSP call extracts the full symbol tree 🧠 Language-specific compactors strip bodies, keep signatures 📦 Content-hash LRU cache — instant on repeat access 🔧 Registers as a Language Model Tool that Copilot calls autonomously 𝗧𝗵𝗲 𝗱𝗮𝘀𝗵𝗯𝗼𝗮𝗿𝗱 alone took a wild turn: 📊 Real-time donut chart + coverage ring 🏆 Top savers leaderboard with medals 📈 Session timeline sparkline 🛡️ Automatic secrets detection — blocks API keys, tokens, private keys from ever reaching the LLM 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀: TypeScript, JavaScript, Python, Go, Java, Rust 𝗧𝗵𝗲 𝗱𝗲𝘀𝗶𝗴𝗻 𝗱𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝘁𝗵𝗮𝘁 𝗺𝗮𝗱𝗲 𝗶𝘁 𝗳𝗮𝘀𝘁: I intentionally cut Call Graph Extractor, Type Hierarchy Extractor, and Query Matcher. This keeps overhead at exactly 1 API call per file instead of 60+. Sometimes the best feature is the one you don't build. 🔗 Install from VS Code Marketplace: https://lnkd.in/e8iY364q 📂 GitHub: https://lnkd.in/eZu_H9_B 📄 MIT Licensed | Built with TypeScript If you're spending money on AI coding tokens, this might save you a lot. #OpenSource #VSCode #GitHubCopilot #AI #DeveloperTools #TypeScript #LLM #TokenOptimization #CodingTools #SoftwareEngineering

GitHub - ajvikram/TokenSlayer: TokenSlayer is a "token-killing" VS Code extension designed to eliminate context bloat and the "orientation tax" that currently limits AI productivity. Functioning as a Semantic Structural Cache, it acts as a local middleman between your IDE and GitHub Copilot, shifting the AI's focus from reading raw text to understanding structured intent. github.com

8 Comments
Like Comment
To view or add a comment, sign in
Gabriel Zacarias
6d
Report this post
Not all GitHub repos need thousands of lines of code to be valuable. Stumbled across something on GitHub trending that made me stop and read twice. One repository with over 90k stars, and it's a single markdown file! 🤯 A set of guidelines specifically for Claude Code that helps reduce common LLM mistakes, like ignoring your intent, modifying code it shouldn't touch, or drifting from what you actually asked for. I've been using Claude Code on my personal projects and this is now a permanent part of my workflow. Worth using as a foundation for your own AI development workflow. https://lnkd.in/dHsnrzNG #claudecode #AI #webdevelopment #developer

GitHub - forrestchang/andrej-karpathy-skills: A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls. github.com
Like Comment
To view or add a comment, sign in
Seena Singh
4w
Report this post
Most developers write their AI assistant rules files once, by hand, and never touch them again. They're generic. They're stale. And if you use more than one AI coding tool, you're maintaining duplicates that slowly drift apart. I built @rulesgen/rulesgen to fix that. It analyzes your actual codebase — frameworks, dependencies, naming patterns, async style, test setup, even recent git history — and auto-generates optimized rules files for: ✅ Claude Code (CLAUDE.md) ✅ Cursor (.cursorrules) ✅ GitHub Copilot (copilot-instructions.md) ✅ Windsurf (.windsurfrules) All from a single command. All tuned to your specific project, not a boilerplate. Supports JS/TS, Go, Python, monorepos, Docker, Terraform, GitHub Actions — and 50+ frameworks out of the box. Get started: npx @rulesgen/rulesgen generate Open source. MIT licensed. Available on npm now. Would love feedback from anyone deep in the AI-assisted dev workflow 🙏 #AITools #DevTools #ClaudeCode #Cursor #GitHubCopilot #buildinginpublic #OpenSource
Like Comment
To view or add a comment, sign in
Balbir Singh
2w
Report this post
I came across an interesting GitHub repo: andrej-karpathy-skills by Forrest Chang, inspired by Andrej Karpathy’s observations on common LLM coding pitfalls. What makes it stand out is how simple it is. The project is built around a single “CLAUDE md” file that promotes four practical principles for AI-assisted coding: think before coding, keep things simple, make surgical changes, and work toward clear, testable goals. The traction is impressive too. At the time of writing, the repo has about 53.7k stars, 4.5k forks, and 34 open pull requests. That tells you this lightweight, practical approach is clearly resonating with developers. What I like most is that it turns a lot of the issues Karpathy has highlighted, such as overengineering, unnecessary changes, and unchecked assumptions, into a usable framework for getting more disciplined and reliable output from coding assistants. A great reminder that sometimes the most valuable AI tooling is not another model, but better principles for how we use one. https://lnkd.in/eimBm22h #AI #GitHub #AndrejKarpathy #SoftwareEngineering #Coding #DeveloperTools #LLM #ArtificialIntelligence #GenAI

GitHub - forrestchang/andrej-karpathy-skills: A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls. github.com
Like Comment
To view or add a comment, sign in
Aaradhya Maharishi
1w
Report this post
I just built an AI-powered code review bot from scratch — and it works autonomously on real GitHub Pull Requests. 🤖 Here's what it does: Every time a PR is opened, the bot automatically triggers, reads the code changes, and posts a structured review as a comment — covering bugs, security vulnerabilities, code quality issues, and performance improvements. No human needed. Zero manual work. Here's how it works under the hood: GitHub Actions listens for every new PR The diff is fetched using the GitHub API The diff is sent to LLaMA 3.3 70B (via Groq API) for review The AI's feedback is posted back as a PR comment automatically The whole pipeline runs on GitHub's servers — fully autonomous. What I learned building this: → How to integrate LLMs into real developer workflows, not just chatbots → GitHub Actions and CI/CD pipeline setup from scratch → Working with REST APIs (GitHub + Groq) end to end → How to handle auth, secrets, and permissions in production environments → Debugging live systems — nothing teaches you faster than a 403 error at midnight 😅 This is the kind of tool that saves hours every week for dev teams — and I built it in Python over a weekend. If you're a startup struggling with slow code reviews or want to see this in action — let's talk. 🔗 GitHub: https://lnkd.in/e9T84Xn3 🌐 Portfolio: aaradhya1807.github.io #Python #AI #GitHub #Automation #OpenToWork #MachineLearning #DevTools #LLM
3 Comments
Like Comment
To view or add a comment, sign in
Aydyn Tairov
2w
Report this post
A Github contribution from Milla Jovovich - Hollywood twist One more evidence about "growth > code" attempts. The point of my recent post was - we're in the attention economy. And here is why: Milla Jovovich (yes, that "The Fifth Element" and "Resident Evil" hero Milla) and her IT/crypto friend Ben dropped a GitHub project called MemPalace. Just the fact that a celebrity of her caliber ships open source code is a milestone worth noting. But then... the audit from Github community happened ( I love this ). And it's… rough. ❗ MemPalace claims: "The highest-scoring AI memory system ever benchmarked" - 96.6% on LongMemEval. 42,000+ GitHub stars. version 3.1.0. A breakthrough in AI memory. ❌ But what MemPalace actually is: - The 96.6% score? That's ChromaDB's score. Default settings. Default embedding model (all-MiniLM-L6-v2). - You could replicate it in ~50 lines of Python. - The engine underneath the headline number is unmodified ChromaDB. - MemPalace is a wrapper - CLI commands, parsers, metadata tagging. - 42,000 stars in 7 days, with timing patterns consistent with bot farms ❌ - The "celebrity founder" GitHub account has 0 prior repos. - v3.1.0 was invented. There is no v1 or v2. I think we will see more patterns like this, a celebrity name + an AI/VC/crypto-adjacent narrative + velocity metrics = deficient attention successfully captured. The technical substance is almost beside the point. Growth > Code is a market reality. The distribution layer is eating the innovation layer. Honestly, if the team behind this had planned and executed the plot more thoroughly, they'd probably be closing a seed round right now. The playbook is that close to working, from first glance. Sloppy execution is the only thing standing between a lot of these projects and real capital. Let's see where this one goes.
2 Comments
Like Comment
To view or add a comment, sign in
Vivek Nile
3w Edited
Report this post
Most people building LLM apps fly blind. 🙈 You write a RAG pipeline, it gives a bad answer, and you have zero idea why. Was it the chunking? The retrieval? The prompt? The model itself? That's the exact problem I ran into while building a RAG system on top of ISLR — a 400+ page ML textbook. The fix? LangSmith. 🔎 What is LangSmith? LangSmith is an observability and debugging platform built specifically for LLM applications. → For traditional software, you have logs, debuggers, and APM tools. → For LLM pipelines, you have LangSmith. Full visibility into every step — what went in, what came out, how long it took, and where it went wrong. No more guessing. 🛠️ How I implemented it I built my RAG pipeline in 4 iterations: 📌 v1 — Basic RAG. It worked. But it was a complete black box. 📌 v2 — Added @traceable decorators to load_pdf(), split_documents(), and build_vectorstore(). Every function now shows up as a named span in LangSmith with input, output, and execution time. 📌 v3 — Nested all child runs under a single root function. One clean, hierarchical trace tree instead of scattered spans. 📌 v4 — Added SHA-256 fingerprinting so the FAISS index only rebuilds when the file actually changes. Cache hits and misses traced as explicit spans. 💡 Why it matters ✅ Full trace visibility across every LLM call and retrieval ✅ Latency profiling per step ✅ No more debugging with print statements ✅ Production monitoring + eval support built in If you're building RAG systems or agents — LangSmith is not optional. It's infrastructure. 🚀 Full code (RAG v1→v4, ReAct Agent, LangGraph) on GitHub: 👉 https://lnkd.in/dkqcvW85 Drop a comment if you're using LangSmith would love to connect! #LangSmith #RAG #LLMOps #LangChain #GenerativeAI #AIEngineering #BuildingInPublic #Python #MLOps

GitHub - viveknile/Langsmith_Example github.com
Like Comment
To view or add a comment, sign in
Lukas Grigis
3w
Report this post
I've been building an open source CLI that runs AI coding agents for you. It breaks work into tasks, runs them in parallel across repos, then spawns a second model to review the first one's output. Shipped v0.2.5 today. The bit worth mentioning: the planner now detects what tooling your project has (subagents, MCP servers, instruction files) and bakes delegation hints into generated tasks. The agent figures out on its own to route security diffs to your auditor or UI checks to Playwright. No configuration needed. Works with Claude Code and GitHub Copilot. MIT licensed. https://lnkd.in/eaKu4yRm #opensource #aicoding #devtools #claudecode #githubcopilot #cli #aiagents

GitHub - lukas-grigis/ralphctl: Agent harness for long-running AI coding tasks — orchestrates Claude Code & GitHub Copilot across repositories 🍩 github.com

3 Comments
Like Comment
To view or add a comment, sign in

13,998 followers

1,514 Posts

View Profile Follow

Gerald Versluis’ Post

More Relevant Posts

Explore related topics

Explore content categories