Merging AI Agent Code: The Hardest Problem

The hardest problem in running parallel AI coding agents is not the coding. It is the merging. Five agents finish their tasks at roughly the same time. Five pull requests target main. That is ten potential merge conflicts, and git's text-based merge cannot resolve most of them because they are structural, not textual. Two agents adding different imports to the same file. Two agents extending the same configuration object. Two agents creating similar utility functions. Before we solved this, merge conflicts were the actual bottleneck in our agent fleet. Not API rate limits. Not context windows. Not model capability. Merge conflicts. The solution was a sequential merge queue. Agent PRs enter a queue and are processed one at a time: rebase onto latest main, run an AST-aware merge driver that understands code structure (not just text lines), regenerate lock files, run the full test suite, then merge. If any step fails, the PR goes back to the end of the queue with a fresh rebase. The AST-aware merge driver is the key insight. Traditional git sees two agents adding import lines to the same file and calls it a conflict. A driver that understands TypeScript syntax sees two non-overlapping additions to an import block and merges them automatically. This runs locally, with no dependency on GitHub's paid merge queue feature. It is a Redis-backed sidecar that auto-starts with the agent fleet. The lesson: scaling AI agents is not about spawning more of them. It is about building the infrastructure that lets their work converge cleanly. The merge queue became the constraint before CPU, memory, or API limits. What infrastructure bottlenecks have you hit when scaling AI-generated code? #AIAgents #SoftwareEngineering #DevTools

1 Comment

Francisco Galarza 2w

Interesting! I ran into similar issues and agree, a git merge needs more context than the diff itself. Needs to know what the intent of the task was. What kind of context do you keep in the AST?

To view or add a comment, sign in

More Relevant Posts

Harshit S Bisht
2w
Report this post
🚀 Built an AI-powered Code Review System (OpenEnv) I developed a production-style environment where AI agents perform automated code reviews — similar to how developers review pull requests in real teams. 🔍 What it does: • Detects bugs, security vulnerabilities, and performance issues • Assigns severity levels (low → critical) • Suggests fixes with explanations • Uses a reward-based system to evaluate AI performance ⚙️ Tech Stack: • Python, FastAPI • Pydantic • OpenEnv framework • Hugging Face Inference APIs • Docker 🧠 Key Improvements: • Replaced keyword matching with F1-score evaluation (precision + recall) • Added hallucination penalty to reduce false positives • Designed multi-step feedback loop for iterative improvement • Built REST API endpoints (/reset, /step, /state) • Structured evaluation logs ([START] / [STEP] / [END]) 🎯 Why this matters: Code review is critical in software engineering. This system simulates how AI can assist in CI/CD pipelines, developer tools, and automated quality checks. 🔗 Live Demo (Hugging Face): https://lnkd.in/g_BQVz8m 💻 GitHub Repository: https://lnkd.in/ggWHWdmv 💡 Moving toward building intelligent developer systems powered by AI. Would love feedback from developers and AI engineers! #AI #MachineLearning #SoftwareEngineering #BackendDevelopment #Python #FastAPI #HuggingFace #OpenSource

GitHub - bisht14/my-openenv github.com

2 Comments
Like Comment
To view or add a comment, sign in
Clinton Adedeji
2w
Report this post
I was trying to build a multi-agent system in Go. Not a toy - a real pipeline with multiple agents, tool calls, dependency ordering, and failure handling. I looked at what existed. There were Python frameworks I could have wrapped. There were suggestions to use the OpenAI SDK directly and wire everything manually. There were some Go repos doing pieces of it; scheduling, or tool calling, or LLM client wrappers, but nothing that handled the full runtime problem: dependency-aware scheduling, parallel execution, failure policies, and MCP integration together in one place. So I started building what I needed for my specific use case. The result is 𝗥𝗼𝘂𝘁𝗲𝘅- an open-source YAML-driven multi-agent runtime for Go, shipped as both a library and a CLI. Here's what it does: - You define your agents, tools, models, and dependencies in YAML. - You declare which agents depend on which. - You configure retry policies, timeouts, and MCP tool server connections per agent. Then you run the runtime, and it handles everything- parsing the dependency graph, scheduling agents in the correct order, running independent agents in parallel, executing tool calls concurrently within each agent, retrying failures according to your policy, and passing outputs between agents. No magic. No hidden state. No framework code you can't read and understand. It's on GitHub - It's early, rough edges and all, but it works. If you're building AI systems in Go, or curious why you should be — I'd love your feedback. And if you find it interesting, a star on the repo goes a long way. More to come. https://lnkd.in/gK-bXDZQ.

GitHub - Ad3bay0c/routex: Lightweight AI agent runtime for Go. Define multi-agent crews in YAML, run them with goroutines and channels, and let the runtime handle scheduling, parallelism, retries, and observability — without leaving the Go ecosystem. github.com
Like Comment
To view or add a comment, sign in
Barnabas Kun
1w Edited
Report this post
We had an interesting engineering problem building Migratowl. When you upgrade 20 dependencies at once and the test suite fails, you have an attribution problem. Which package caused the failure? You could test each package in isolation. Accurate, but that's 20 separate sandbox runs, each taking minutes. Or you could trust the bulk output and guess. Fast — but one failure can look like it's caused by the wrong package entirely. We landed on a hybrid: 1. Run everything upgraded at once first. If tests pass → every package is safe, confidence 1.0. Done. 2. If tests fail, score confidence per package: - Does the error message name this package directly? → high confidence - ImportError or AttributeError for a known API? → high confidence - Major version jump (e.g. 2.x → 3.x)? → confidence boost - Generic failure? → low confidence 3. High confidence (≥ 0.7): fetch changelog, generate report immediately. Low confidence: spawn an isolated AI subagent that tests only that package. The common case (most upgrades pass, or one obvious culprit fails) stays fast. The hard case (ambiguous multi-package failures) gets accurate attribution. The subagent delegation is the part I'm proudest of. Recursive LangGraph agents, each running inside their own K8s pod workspace, merging results back into a single structured report. Repo: https://lnkd.in/dWwZd5Aq Curious if anyone else has tackled multi-agent confidence scoring differently. #kubernetes #python #ai #devops

GitHub - bitkaio/migratowl: AI dependency migration analyzer — reads every changelog so you don't have to. github.com
Like Comment
To view or add a comment, sign in
Swati Ahuja
1mo Edited
Report this post
Claude Code's source was accidentally published to npm. So I studied every prompt in the codebase using claude. Here's what I found and I'm open-sourcing all of it. Claude Code uses 26 distinct prompts to function: > 1 system prompt (identity, safety, code style, tool routing) > 11 tool prompts (shell, file ops, search, web, planning) > 5 agent prompts (explorer, architect, verifier, docs, general) > 4 memory prompts (summarization, session notes, extraction) > 1 coordinator prompt (multi-agent orchestration) > 4 utility prompts (titles, recaps, suggestions) The patterns that stood out: 1. Anti-over-engineering rules: "don't add features beyond what was asked" 2. Tiered risk assessment : freely edit files, but confirm before force-pushing 3. Adversarial verification : a dedicated agent whose job is to TRY TO BREAK the implementation 4. Memory compression : 9-section summarization that preserves every user message 5. Never delegate understanding : "write prompts that prove you understood" I have rewritten every prompt from scratch for legal compliance. Same behavioral intent wihout verbatim copying text. The repo includes: > Every prompt, ready to copy into your own agent > 9 pattern analyses with commentary > 3 claude skills you can drop in today > MIT licensed you can fork and reuse as it is. If you're building AI coding agents, this will save you months of prompt engineering. Link: https://lnkd.in/gNizmf6T #PromptEngineering #ClaudeCode #AI #AIAgents #LLM #OpenSource

GitHub - swati510/claude-code-prompts: Independently authored prompt templates for AI coding agents — system prompts, tool prompts, agent delegation, memory management, and multi-agent coordination. Informed by studying Claude Code. github.com

7 Comments
Like Comment
To view or add a comment, sign in
Mayank Ahuja
1mo
Report this post
If you're building AI agents and haven't seen this yet : it's every prompt Claude Code uses, rewritten and open-sourced. The verification agent pattern alone is worth the click.

Swati Ahuja

Software Engineer at Salesforce
1mo Edited

Claude Code's source was accidentally published to npm. So I studied every prompt in the codebase using claude. Here's what I found and I'm open-sourcing all of it. Claude Code uses 26 distinct prompts to function: > 1 system prompt (identity, safety, code style, tool routing) > 11 tool prompts (shell, file ops, search, web, planning) > 5 agent prompts (explorer, architect, verifier, docs, general) > 4 memory prompts (summarization, session notes, extraction) > 1 coordinator prompt (multi-agent orchestration) > 4 utility prompts (titles, recaps, suggestions) The patterns that stood out: 1. Anti-over-engineering rules: "don't add features beyond what was asked" 2. Tiered risk assessment : freely edit files, but confirm before force-pushing 3. Adversarial verification : a dedicated agent whose job is to TRY TO BREAK the implementation 4. Memory compression : 9-section summarization that preserves every user message 5. Never delegate understanding : "write prompts that prove you understood" I have rewritten every prompt from scratch for legal compliance. Same behavioral intent wihout verbatim copying text. The repo includes: > Every prompt, ready to copy into your own agent > 9 pattern analyses with commentary > 3 claude skills you can drop in today > MIT licensed you can fork and reuse as it is. If you're building AI coding agents, this will save you months of prompt engineering. Link: https://lnkd.in/gNizmf6T #PromptEngineering #ClaudeCode #AI #AIAgents #LLM #OpenSource

GitHub - swati510/claude-code-prompts: Independently authored prompt templates for AI coding agents — system prompts, tool prompts, agent delegation, memory management, and multi-agent coordination. Informed by studying Claude Code. github.com
Like Comment
To view or add a comment, sign in
🤖 Diego Maye
3w
Report this post
This is one of those tools that makes you rethink how AI coding should actually work. Just discovered rtk — an open-source CLI proxy designed for AI coding agents. The idea is simple, but powerful: 👉 Most of the tokens we burn in LLM workflows are NOT in prompts… 👉 They’re in noisy command outputs (logs, git, docker, tests, etc.) RTK sits between your terminal and your AI agent and compresses that noise into structured, minimal output — without losing meaning. The result? • Up to 60–90% token reduction (GitHub) • Faster responses • Longer agent sessions • Lower costs • Cleaner reasoning loops Think about this for a second: In agent-based systems (Cursor, Claude Code, Codex…), every command execution feeds back into the model. If that feedback is noisy → you waste context If it’s structured → you unlock scale RTK is basically introducing a new layer in the AI stack: “Output optimization layer” And this is where things get interesting… As agents become more autonomous, 👉 token efficiency stops being an optimization… and becomes architecture. This is the kind of tooling that will define the next wave of AI-native development. If you’re building with AI agents, this is definitely worth a look: https://lnkd.in/dD_qrwic

GitHub - rtk-ai/rtk: CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies github.com

1 Comment
Like Comment
To view or add a comment, sign in
Georgios Spanos
5d
Report this post
Your app code is clean. Your CI/CD config is a disaster. We spend hours debating variable naming in our logic, but we treat our .github/workflows and gitlab-ci.yml like a junk drawer. In 2026, AI code review isn't just for "finding bugs"—it’s for cleaning up the infrastructure plumbing. Here is what AI-driven review actually catches (and why it saves your team days of frustration): 1. The "Duplicate Step" Trap I see this in 90% of legacy pipelines: three different stages running the exact same dependency install or environment setup. The AI Fix: It identifies redundant logic across 50 files and suggests a single, reusable action or template. The Result: Faster builds and 50% less YAML to maintain. 2. The Caching Gap If your pipeline takes 10 minutes but 8 of those are spent downloading the same npm or Go modules every time, you’re burning money. The AI Fix: It notices missing actions/cache or mount points and tells you exactly where to inject the cache keys. The Result: Build times drop from "coffee break" to "instant." 3. Semantic Naming (No more "Step 1") "Step 1," "Build-final-v2," "Test-3." Poor naming makes debugging a nightmare when things fail. The AI Fix: It looks at the command being run (e.g., go test ./...) and suggests clear, semantic names like Unit Tests: Golang. The Result: You can actually read your logs without a translator. 4. The "Risky Secret" Pattern This is the big one. Developers often pass secrets as plain-text environment variables or, worse, hardcode a "test" token "just for a minute." The AI Fix: It catches patterns that look like keys or identifies where you're passing secrets into non-secure steps. The Result: You stay out of the headlines for a data breach. AI is the "linter on steroids" for your infrastructure. Don't waste senior engineering time reviewing YAML syntax. Let the AI clean the pipes so your team can build the house. #DevOps #Engineering #SoftwareEngineering #CICD #YAML #PlatformEngineering #SRE #AIinTech
1 Comment
Like Comment
To view or add a comment, sign in
Janakiram MSV
2w
Report this post
Cursor, Claude Code, and Codex are merging into one AI coding stack nobody planned OpenAI just shipped an official plugin that runs inside Anthropic's Claude Code. Not a workaround. Not a community hack. An Apache 2.0-licensed plugin from OpenAI, installed directly into a competitor's terminal. Same week, Cursor 3 launched a rebuilt interface that treats the code editor as secondary. The default view is now an Agents Window for managing fleets of coding agents across repos and environments. Google's Antigravity reached the same conclusion with its Manager Surface. I wrote about what this means for developers - https://lnkd.in/g8QgDDhh Three layers are forming. Orchestration on top, where you manage and route agents. Execution in the middle, where coding agents write, test, and commit code. Review at the bottom, where a different model from a different provider challenges the code the first one wrote. The interesting part is the review layer. When Claude writes code and Codex reviews it, you get independent scrutiny. Different training data, different blind spots. You are no longer asking someone to grade their own homework. Nobody designed this stack. Developers assembled it because no single tool covers everything. Claude for precision on complex refactors. Codex for throughput on parallel tasks. Cursor as the control plane on top. We went through the same thing with infrastructure. Terraform, Docker, Kubernetes. Not one tool to rule them all. Composable layers that got better together. Are you already running multiple coding agents in the same workflow, or still picking one and hoping it covers everything? #AIcoding #DevTools #CodingAgents #SoftwareEngineering
2 Comments
Like Comment
To view or add a comment, sign in
Shivdeep (Shiv) Nancherla
2w
Report this post
As the adage in Software Engineering goes, never test your own code. In the AI-driven world of coding, the new rule is: never use the model that you used to write code to review it. I just finished reading an article by #Janakiram on how the AI coding market is not consolidating into one "winner", but is instead evolving into a specialized, three-layer stack: 1️⃣ The Orchestration Layer: Cursor 3 (specifically the new "Glass" interface) is moving beyond the editor to become a control plane for managing fleets of parallel agents. 2️⃣ The Execution Layer: This is the engine room. While Claude Code is winning on nuanced reasoning, OpenAI Codex is being tapped for high-throughput, asynchronous tasks. 3️⃣ The Review Layer: This is the game-changer. By using the new codex-plugin-cc, developers are now using Codex to provide independent, "adversarial" reviews of code written by Claude. #Janakiram makes a compelling point: we are moving away from walled gardens toward interoperability. The biggest players in AI have start building plugins for their competitors, the future seems to be about composition and not competition.
Janakiram MSV

Analyst | Advisor | Architect
2w

Cursor, Claude Code, and Codex are merging into one AI coding stack nobody planned OpenAI just shipped an official plugin that runs inside Anthropic's Claude Code. Not a workaround. Not a community hack. An Apache 2.0-licensed plugin from OpenAI, installed directly into a competitor's terminal. Same week, Cursor 3 launched a rebuilt interface that treats the code editor as secondary. The default view is now an Agents Window for managing fleets of coding agents across repos and environments. Google's Antigravity reached the same conclusion with its Manager Surface. I wrote about what this means for developers - https://lnkd.in/g8QgDDhh Three layers are forming. Orchestration on top, where you manage and route agents. Execution in the middle, where coding agents write, test, and commit code. Review at the bottom, where a different model from a different provider challenges the code the first one wrote. The interesting part is the review layer. When Claude writes code and Codex reviews it, you get independent scrutiny. Different training data, different blind spots. You are no longer asking someone to grade their own homework. Nobody designed this stack. Developers assembled it because no single tool covers everything. Claude for precision on complex refactors. Codex for throughput on parallel tasks. Cursor as the control plane on top. We went through the same thing with infrastructure. Terraform, Docker, Kubernetes. Not one tool to rule them all. Composable layers that got better together. Are you already running multiple coding agents in the same workflow, or still picking one and hoping it covers everything? #AIcoding #DevTools #CodingAgents #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Hiếu Nguyễn
2w
Report this post
Developers are getting more value from LLMs and coding agents, but one problem still shows up everywhere: bad context. Most agents do not fail because they cannot generate code. They fail because they read the wrong files, miss the real flow, or suggest edits without enough evidence. That is why hieuchaydi/RepoBrain stands out. RepoBrain is a local-first codebase memory engine for AI coding assistants. It helps agents understand a repository before they generate or modify code. Instead of relying on guesswork, it indexes the repo into symbols, chunks, and dependency edges, then combines retrieval, tracing, and ranking to surface grounded evidence. A few things that make it interesting: hybrid retrieval with BM25, embeddings, and reranking flow tracing across route → service → job edit target ranking with explicit rationale confidence scoring and warnings when evidence is weak local-first workflow with CLI, browser UI, and MCP support What I like most is the direction behind it: make AI coding agents less reckless, not just more powerful. That feels like the real next step for agent tooling. Repo: https://lnkd.in/ggAjSMGY #GitHub #OpenSource #AI #LLM #CodingAgents #DeveloperTools #Python

GitHub - hieuchaydi/RepoBrain: Local codebase indexing and search tool. Supports hybrid search, flow tracing and impact analysis. github.com

2 Comments
Like Comment
To view or add a comment, sign in

6,249 followers

135 Posts

View Profile Follow

Merging AI Agent Code: The Hardest Problem

More Relevant Posts

Explore related topics

Explore content categories