GitHub Merge Queue Integrity Failure Exposed

GitHub silently deleted your merged code. And you'd never know. No error. No conflict. No warning. Just a clean merge that quietly rewrote your main branch. Here's what happened on April 23rd A bug in GitHub's merge queue caused PRs to build on the wrong base commit. You reviewed: +29 lines added, -34 removed What landed on main: +245 added, -1,137 removed Thousands of lines of shipped code. Gone. CI passed. Branch protection ran. PR showed "Merged." Everything looked fine. 2,092 PRs. 658 repos. 4.5 hours. No public outage banner. Ever. The recovery? Manual. Comb through commit graphs. Reconstruct history by hand. Re-merge closed PRs. Some teams had dozens of corrupted commits before anyone noticed. This wasn't an outage. It was an integrity failure. And it exposes something bigger 👇 We've delegated trust to automation without verifying the contract it's keeping. A merge queue has one job: The commit CI tested = the commit that lands. When that breaks silently, everything downstream is suspect. Builds. Deployments. Compliance audits. All of it. GitHub is also dealing with a capacity crisis they planned for 10x growth, realized they need 30x, and have had no CEO since mid-2025. The cracks are showing. Trust in tooling is built over years. It can crack in an afternoon. #GitHub #SoftwareEngineering #DevOps #EngineeringLeadership

2 Comments

Raahul Seshadri 3d

Trust in tooling is built over years and can crack in an afternoon.

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Frode Nilssen
5d
Report this post
GitHub's merge queue silently rewrote main branch history on April 23rd. The pattern: PR shows a +29 / -34 diff. Reviewed, approved, queued. What lands is +245 / -1,137 — thousands of lines of already-shipped code quietly removed. Every merge after that stacks on the broken history. UI shows nothing wrong. GitHub says 2,800 PRs out of 4 million. One company reported 200+ on its own. Pick a number. The part nobody's saying out loud: for history to get overwritten like this, something is force-pushing to main behind the scenes. Branch protection apparently doesn't apply to GitHub itself. Worth thinking about what else moves through that path silently. The deeper issue isn't the bug. Bugs happen. The issue is that "distributed version control" became a single vendor's merge button for most of the industry, and the merge button lied for a day. Git itself was fine the whole time. It always is. I run my own Gitea. Recommend it. #GitHub #Git #DevOps #Gitea #SelfHosted #SoftwareEngineering

2 Comments
Like Comment
To view or add a comment, sign in
SHIVENDRA JAT
3d
Report this post
GitHub just had one of the worst weeks in its history. And as engineers, we need to talk about it. Here's what happened 👇 🔴 Incident #1 - The Silent Code Killer On April 23, GitHub's merge queue silently reverted previously merged code across 658 repos and 2,092 PRs - during a 4-hour window. The scariest part? Their automated monitoring caught nothing. They found out via support tickets, 3.5 hours later. The root cause? A change to an unreleased feature that was supposed to be behind a feature flag - but wasn't. The broken code shipped to everyone. 🔴 Incident #2 - The Botnet Blackout On April 27, a suspected botnet overwhelmed GitHub's Elasticsearch cluster. PR lists, issue lists, project views - all blank. For 4+ hours. Data was fine. You just couldn't see any of it. 🔴 Incident #3 - The Uptime Nobody Talks About A developer built an unofficial GitHub status tracker that actually counts degraded performance as downtime (wild concept, right?). Current uptime: 85.51% Industry standard: 99.9% GitHub's official page classifies broken search, PRs not loading, and slowdowns as "Degraded Performance" - technically up, practically unusable. The CTO has now issued a public apology. The reason? Agentic AI workflows pushed GitHub way past its designed limits. They planned for 10x capacity growth. By February, they realized they needed 30x. Three lessons every engineering team should take from this: 1️⃣ Feature flags only work if they're actually enforced - at the infrastructure level, not just in code review. 2️⃣ Monitor for correctness, not just availability. A system can be "up" and completely broken. 3️⃣ How you report incidents is a trust signal. GitHub is now rolling out a 3-tier status system (Degraded / Partial / Major outage) with per-service uptime. That's the right move - just years late. AI-driven workloads are scaling faster than anyone predicted. If it caught GitHub off guard, ask yourself: Is your infrastructure ready? ♻️ Repost if your team uses GitHub. They need to see this. #GitHub #SoftwareEngineering #DevOps #Engineering #IncidentResponse #FeatureFlags #WebDevelopment
Like Comment
To view or add a comment, sign in
Yudistira Ashadi

AI & Software Solutions for Government & Enterprise | Co-Founder @ PT Graha Teknologi Maju | Serving Kementerian PUPR, Unilever, 500+ orgs across Indonesia
5d
Report this post
GitHub had a rough week. Three separate events, each significant on its own. Read together, they are harder to dismiss. 1. The outage April 27. GitHub down for roughly 4.5 hours. Search degraded, Actions Jobs delayed on Larger Runners, traced back to an internal Elasticsearch problem. The downtime itself was not the painful part. The ripple effect was. CI/CD pipelines failed to trigger. PR reviews stalled. Issue comments lost. npm installs that pulled from github.com timed out. Production deploys via Actions queued up. A lot of teams realized how many of their workflows were anchored to a single platform. 2. Mitchell Hashimoto pulled Ghostty off GitHub On April 28, Hashimoto (HashiCorp founder, creator of Ghostty) published a post titled "Ghostty Is Leaving GitHub." He is GitHub user 1299. Joined February 2008. Used the platform every day for 18 years. For the past month, he had been keeping a journal, marking every day a GitHub outage blocked his work. Almost every day had an X. In his own words: "I want to ship software and it doesn't want me to ship software." The migration plan had been in the works for months. The April 27 outage was coincidental timing, not the trigger. 3. CVE-2026-3854 A critical RCE affecting GitHub.com and GitHub Enterprise Server. CVSS 8.7. The bug itself looked simple. During git push, push option values were not sanitized before being inserted into internal service headers. The result was command injection. A single push to a single repository let an authenticated attacker execute arbitrary commands on GitHub's backend. Given the multi-tenant architecture, code execution on one node could expose millions of repositories sitting on shared storage. Discovered by Wiz Research on March 4. GitHub.com was patched the same day. GHES required an upgrade to 3.19.3 or later. At the time of public disclosure, 88% of GHES instances were still unpatched. Three different stories. One thing in common. A lot of teams have wired their entire delivery pipeline through a single platform that, for the past few weeks, has been less reliable than the people who depend on it would like. Migration is not always realistic. But the question is worth asking out loud: If GitHub goes down for 4 hours next week, can your team still ship? #GitHub #SoftwareEngineering #DevOps #OpenSource #Cybersecurity
Like Comment
To view or add a comment, sign in
Harsh Tripathi
4d
Report this post
Even the giants have "off" days: Lessons from GitHub’s Merge Queue regression. GitHub recently confirmed a bug where roughly 2,800 pull requests were merged from the wrong base state, unintentionally reverting previous changes. While 0.07% sounds small, in production, "small" percentages can mean major downtime. Key Takeaways for Teams: 1)Automated Testing is King: GitHub is already expanding test coverage for merge operations. 2)Trust, but Verify: Always keep an eye on your branch history after a merge, especially when using automated queues. 3)Transparency Wins: Kudos to Kyle Daigle and the GitHub team for the quick RCA (Root Cause Analysis) and direct outreach to affected users. Have you ever encountered a "silent revert" in your workflow? How does your team guard against tool-level regressions? #GitHub #DevOps #SoftwareEngineering #CI/CD #TechNews
Like Comment
To view or add a comment, sign in
Victor Jaloba
2d
Report this post
GitHub’s recent incident is every engineer’s nightmare: ✅ CI passed ✅ PR approved ✅ Merge successful …and the wrong code still landed in main. On April 23, GitHub confirmed that a merge queue issue affected 2,092 pull requests across 658 repositories, producing incorrect commits and silently reverting code in some cases. Good reminder that “all checks passed” doesn’t always mean “everything is correct”. #GitHub #SoftwareEngineering #DevOps #CodingLife
Like Comment
To view or add a comment, sign in
Harish Karthick S
3d
Report this post
"I want to code. And I can't code with GitHub anymore." That's Mitchell Hashimoto. GitHub user #1299. The man who built Terraform and Vagrant. After 18 years — he's moving Ghostty off GitHub. And I don't blame him. For a month he kept a journal. Every day GitHub disrupted his work, he marked an X. Almost every day had one. Then April 23: a squash merge bug corrupted 658 repos and 2,092 PRs. That's not downtime. That's data loss. Then April 27: All of GitHub — search, Issues, PRs, Projects — went completely dark. GitHub's CTO apologized. Said they now need 30× capacity. February alone had 37 platform incidents. Here's what nobody's saying: GitHub is bending under the weight of agentic AI. Copilot sessions. Parallel agents. Millions of automated calls per minute. The platform was never designed for this. And it's cracking. When the person who defined modern DevOps infrastructure says GitHub is "no longer for serious work" — that's not a hot take. That's a warning. Where do you go when GitHub goes down? 👇 #GitHub #OpenSource #Ghostty #Developers #DevTools #SoftwareEngineering #Tech #BuildInPublic
Like Comment
To view or add a comment, sign in
Heba Hesham
3w
Report this post
Last week, someone wiped our entire codebase. The whole Bitbucket repository — replaced with a single commit: "Repository cleared." Every branch. Every commit. Every line of history. Gone. And at the same time? For about 30 minutes, I just stared at the screen. Then I got to work. Step 1: I found an old commit hash that was still cached on Bitbucket's servers. Step 2: git fetch origin [that hash] — 2,059 objects came back. Step 3: Force-pushed the recovered code to a new repo. Full codebase? Recovered. We went from "everything is gone" to "everything is back" in the same day. But here's the lesson that actually matters: After the recovery, I sent the team a list of 5 changes we need to make: Branch protection rules — no one pushes directly to main. Pull request reviews before any merge. Minimum 2 admins on every platform . Regular backups — not "we should do this someday," but scheduled. Access review across all platforms. Recovery is great. But prevention is the actual job. The scariest moment wasn't discovering the code was gone. It was realizing we had no safeguards to stop it from happening in the first place. #DevOps #Git #Bitbucket #IncidentResponse #CodeRecovery #SecurityByDesign #IntegrationEngineering #TechLeadership
Like Comment
To view or add a comment, sign in
Uzair shekhani
1mo
Report this post
Everyone jokes about rm -rf *… until it actually happens. A while back, GitHub engineer accidentally accidentally ran a destructive command on the wrong repository. Not a fork. Not a personal project. The company’s main GitHub repo. Within seconds… pipelines failed. Services broke. Data disappeared. Panic kicked in. And this wasn’t a small startup. This was at the scale where even minutes of downtime matter. But here’s the part no one talks about 👇 The system came back. Why? Because great engineering isn’t about never making mistakes. It’s about designing systems that survive mistakes. -> GitHub backups saved them -> Branch protections prevented even worse disasters -> Teams jumped in and fixed things fast Within hours, everything was restored. 💡 The lesson? If you’ve ever broken something in code, accidentally deleted a branch, or messed up production… You’re not alone. Even the best engineers have done it. The difference isn’t perfection. The difference is how fast you recover and what you learn. So next time you make a mistake… Don’t panic. Improve your system. Because in tech, mistakes are not the end. They’re part of the process. #github #programming #softwareengineering #devlife #learning #growth #tech
6 Comments
Like Comment
To view or add a comment, sign in
NoShip

84 followers
3w
Report this post
Hard freeze: the system won't let you merge. Soft freeze: "please don't merge." Guess which one works. Every "Slack-message-and-hope" freeze I've seen eventually gets violated. Sometimes by a well-meaning engineer who missed the thread. Sometimes by a contractor who isn't even in the channel. Sometimes by the merge queue itself, which doesn't read Slack at all. The fix isn't better communication. It's a required status check that says no. NoShip turns your freeze into a GitHub check that blocks merges at the source — across every repo, every branch, every environment. Policy becomes control. No honor system required. #CodeFreeze #DevOps #GitHub #SRE #PlatformEngineering #DeploymentSafety #EngineeringLeadership #ChangeControl
Like Comment
To view or add a comment, sign in
Gordon Beeming
3w
Report this post
Back in November I looked at a problem and thought "that's going to be fun to solve." GitHub Copilot CLI running inside a Docker sandbox needs Docker access. Testcontainers, integration tests, build pipelines. They all need a working Docker socket. The obvious answer? Mount /var/run/docker.sock into the container. The obvious answer is also terrifying. That socket is root access to your host machine. Any image, privileged containers, host filesystem mounts. For a human dev, you trust yourself. For Copilot running autonomously... not so much. Last year I built an Airlock feature that hardens network traffic, routing everything through an allowlist-enforcing proxy. That was step one. The Docker socket broker was the piece I kept putting off because the problem was harder. The broker sits between the container and the real Docker daemon. Every API call goes through it. 65 endpoints explicitly allowed, everything else blocked. When Copilot tries to create a container, the broker inspects the body: checks the image against an allowlist (empty by default, you name what you trust), blocks privileged mode, blocks host namespace sharing, blocks mounts to /etc, /root, /var, and the socket itself. Combine it with the Airlock I built last year and sibling containers spawned by Copilot get auto-joined to the isolated network too. Network-level and API-level lockdown at the same time. It wasn't one of those "throw a single prompt at it and it's solved" problems. In standard mode, everything works: Testcontainers, docker builds, multi-service setups. Through Airlock, some scenarios like Testcontainers port connectivity still need work. The feature I built first is ironically the part holding up the last 10%. copilot_here is growing in ways I didn't expect for a tool I built because I was too paranoid to give GitHub Copilot full shell access. 6 external contributors. 81 stars on GitHub. 24.9k container image downloads in the last 30 days (according to GitHub Packages stats). If you're running GitHub Copilot CLI and want Docker access without the "hope nothing goes wrong" approach, the deep dive on how the broker works is linked in the comments. And if you find it useful, a star on GitHub helps more than you'd think. #Docker #DevOps #OpenSource #GitHubCopilot #Security

5 Comments
Like Comment
To view or add a comment, sign in

1,031 followers

237 Posts

View Profile Connect

GitHub Merge Queue Integrity Failure Exposed

More Relevant Posts

Explore content categories