Don't couple testing to CI, run tests in the cluster

Your CI went down last week. A platform team I talked to lost three deploys to it. Not because the deploys broke. Because nothing could run tests. Every commit, every PR, every release gate was wired through CI. When CI dropped, validation dropped with it. This is what people miss when they say "CI is our test runner." Your test infrastructure is only as reliable as the system you bolted it onto. If your testing strategy goes dark every time GitHub Actions degrades or Jenkins agents flake, that's not a CI problem. That's an architecture problem. Tests should run on the same infrastructure your apps run on. If your apps live in Kubernetes, tests should run in Kubernetes. If your apps survive a CI outage, tests should too. If your apps scale to 1,000 pods, tests should match. One team I work with pulled their tests out of CI entirely. Tests now run in the cluster alongside the workload. Result: ~100 engineering hours per week reclaimed. CI outages stopped being a release event. The lesson isn't "switch CI providers." It's "stop coupling testing to CI in the first place." Worth a conversation if your last release was held up by something you don't actually own. #Kubernetes #DevOps #PlatformEngineering #SRE #Testkube

To view or add a comment, sign in

More Relevant Posts

Jonathan C. Phelps
1w
Report this post
A pattern I keep seeing in enterprise platform eng conversations: A lot of large enterprises have acquired 3–5 companies in the last decade. Each one brought its own testing stack. > Jenkins here. GitHub Actions there. A team still running Cypress locally because nobody wired up their CI. The platform team inherits the mess. Maybe 30% of engineers are on Kubernetes. The rest are on VMs and legacy apps. "Consolidate the testing stack" sounds rational in the boardroom. In practice, it's the riskiest migration a platform team could run, because the blast radius of breaking a test framework is production bugs. What I tell the Directors of Platform Eng I talk to: Don't consolidate the frameworks. Consolidate the orchestration layer underneath. Let every team keep the tool that fits their stack: Playwright, K6, Postman, JUnit, whatever. Put one control plane on top that tracks execution, flakiness, RBAC, and audit across all of them. That's the path acquisition-heavy orgs actually ship. What's worked for platform leaders dealing with this? #PlatformEngineering #Kubernetes #DevOps #CICD
Like Comment
To view or add a comment, sign in
Usama Rasheed
2w
Report this post
I once deployed a Node.js service to production with zero pipeline. Just git pull on the server. Manual. Every. Time. It worked fine — until a teammate pulled mid-deploy on a Friday night and took down an API serving 5,000+ users. Nobody told us. We found out because users stopped reaching us. Two days later, I had a GitHub Actions pipeline running — automated builds, zero-downtime deploys, Slack notifications on every push. Deployment time dropped 60%. Downtime went to zero. Don't wait for the Friday night incident to take CI/CD seriously. If your deploy process is still "SSH and pray" — that's the sign. #MERN #FullStackDeveloper #DevOps #CICD #BackendDevelopment
Like Comment
To view or add a comment, sign in
Prathamesh Bhongale
4d
Report this post
Your Kubernetes cluster is lying to you. And you won't find out until prod breaks. Here's a problem most platform engineers don't talk about enough: Config drift across environments. Everything looks identical — dev, staging, prod. Same Helm charts. Same GitOps repo. Same manifests. Then prod goes down. And you spend 3 hours figuring out why staging never caught it. Here's what actually happened: Someone patched a ConfigMap directly on the prod cluster with "kubectl edit" during last month's incident. Just a quick fix. "I'll raise a PR later." They didn't. Now prod is running a config that exists nowhere in Git. Your GitOps tool (ArgoCD, Flux — doesn't matter) shows everything as Synced because drift detection only works if the live state diverges from what's currently in Git. But the patch was never in Git to begin with. This is the gap nobody warns you about: - GitOps doesn't protect you from changes that never entered Git - kubectl diff only compares against what's applied, not what should exist - Multi-cluster setups multiply this problem — 5 clusters, 5 different "versions of truth" - The longer it goes undetected, the harder the blast radius when it surfaces The fix isn't just "don't use kubectl edit" — that battle is already lost in most orgs. The real fix is drift detection as a first-class concern: - Enable ArgoCD's self-heal and prune flags so live state is continuously reconciled - Run kubectl diff in your CI pipeline before every deploy, not just locally - Set up audit logging on your clusters — who ran kubectl commands, and when - Tools like Kyverno or Datree can flag live state mismatches proactively - Treat your cluster state like a database — no manual writes, ever The hardest part isn't the tooling. It's the culture shift of making "I'll fix it in Git later" completely unacceptable. Because in a fast-moving team, "later" is when prod burns. Been burned by config drift before? Drop it in the comments. #Kubernetes #DevOps #PlatformEngineering #GitOps #K8s #SRE #CloudNative
Like Comment
To view or add a comment, sign in
OpsTree Global

24,631 followers
3w
Report this post
100MB Files in Git: A Hidden Risk to Repository Performance Large files rarely create immediate issues; but over time, they slow repositories, impact developer productivity, and introduce unnecessary complexity. Addressing this isn’t just about deletion. It requires a controlled approach to rewriting history without disrupting teams or delivery pipelines. This blog outlines how to safely remove 100MB+ files at scale, ensuring cleaner repositories and more reliable development workflows. Read more: https://lnkd.in/g8UKj55V ------------------ Shankar Prasad Jha Sandeep Rawat Yogesh Baatish Arpit Jain Vedant K. Khalid Ahmed Jinesh Koluparambil Buildpiper - By OpsTree ------------------ #Git #DevOps #VersionControl #PlatformEngineering #TechLeadership #EngineeringExcellence #ScalableSystems #DeveloperProductivity
Like Comment
To view or add a comment, sign in
Danylo Kochetov
3w
Report this post
GitOps: Why I Stopped Running kubectl Manually A while back I made a rule for myself: no more manual kubectl apply in production. Ever. It felt uncomfortable at first. Like giving up control. But the reality is — it was the opposite. Once we moved to a full GitOps workflow with ArgoCD, every change became: — Versioned in Git — Reviewed via pull request — Automatically synced to the cluster — Fully auditable Rollbacks went from a 30-minute fire drill to a simple git revert. Deployment confidence went through the roof. And the best part? Teams that previously depended on the "infra guy" could now self-serve their own deployments safely. GitOps is not just a deployment strategy. It's a cultural shift — from "who did what and when" to "the repo is the single source of truth." If you're still doing manual deployments, try this: pick one non-critical service and move it to GitOps. See how it feels. You probably won't go back. #GitOps #ArgoCD #Kubernetes #DevOps #ContinuousDelivery #SRE
Like Comment
To view or add a comment, sign in
Lucas Leite
3w
Report this post
Just shared a new post on my blog. A practical look at how I design CI/CD pipelines with GitHub Actions — prioritizing clarity, fast feedback cycles, and maintainability over unnecessary complexity. These are patterns that have worked well for me in real projects, especially when scaling workflows and keeping deployments predictable. If you're refining your pipeline strategy, this might be worth a read :) https://lnkd.in/dKbd6zEa #DevOps #CICD #GitHubActions #SoftwareEngineering

Modern CI/CD Pipelines with GitHub Actions olucasleite.dev
Like Comment
To view or add a comment, sign in
Abdullah Abdi
1w
Report this post
𝗖𝗜/𝗖𝗗 𝗶𝘀 𝗺𝗼𝗿𝗲 𝘁𝗵𝗮𝗻 𝗷𝘂𝘀𝘁, 𝗳𝗮𝘀𝘁𝗲𝗿 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝘀… Most people hear CI/CD and think "𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗲𝗱 𝗱𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁𝘀". That's part of it, but it's not the full picture. CI/CD is what separates fragile, manual release processes from engineering workflows that scale. 𝗛𝗲𝗿𝗲'𝘀 𝗵𝗼𝘄 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗯𝗿𝗲𝗮𝗸𝘀 𝗱𝗼𝘄𝗻: 𝗖𝗜 (𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻) - 𝗰𝗮𝘁𝗰𝗵 𝗽𝗿𝗼𝗯𝗹𝗲𝗺𝘀 𝗯𝗲𝗳𝗼𝗿𝗲 𝘁𝗵𝗲𝘆 𝘀𝗵𝗶𝗽: ➡️ 𝗖𝗼𝗱𝗲: developers push to GitHub or GitLab, pipeline kicks off automatically. ➡️ 𝗕𝘂𝗶𝗹𝗱: tools like Gradle, Webpack, or Bazel package the code. ➡️ 𝗧𝗲𝘀𝘁: Jest, Playwright, and JUnit run against every change before it goes anywhere near prod. ➡️ 𝗥𝗲𝗹𝗲𝗮𝘀𝗲: Jenkins or Buildkite orchestrate the pipeline from start to finish. 𝗖𝗗 (𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗗𝗲𝗹𝗶𝘃𝗲𝗿𝘆/𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁) - 𝘀𝗵𝗶𝗽 𝗿𝗲𝗹𝗶𝗮𝗯𝗹𝘆 𝗲𝘃𝗲𝗿𝘆 𝘁𝗶𝗺𝗲: ➡️ 𝗗𝗲𝗽𝗹𝗼𝘆: Kubernetes, Docker, Argo, or AWS Lambda push changes live. ➡️ 𝗢𝗽𝗲𝗿𝗮𝘁𝗲: Terraform keeps infrastructure consistent so environments don't drift. ➡️ 𝗠𝗼𝗻𝗶𝘁𝗼𝗿: Prometheus and Datadog watch for issues so your team catches them before users do. The real value isn't just 𝘀𝗽𝗲𝗲𝗱. CI/CD reduces 𝗵𝘂𝗺𝗮𝗻 𝗲𝗿𝗿𝗼𝗿, tightens feedback loops, and builds systems resilient enough to handle change at scale. The manual deployment process that works fine for a small team becomes a 𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 the moment things grow. Done right, your team stops dreading release day. What's one tool you can't live without in your pipeline? #devops #cicd #automation #cloudnative #kubernetes
1 Comment
Like Comment
To view or add a comment, sign in
HostMyCode Web Hosting

169 followers
2w
Report this post
Direct git pulls in production = guaranteed downtime. Use staging directories and atomic deployments for zero-downtime updates. Your users will thank you. #WebDev #DevOps #HostMyCode
Like Comment
To view or add a comment, sign in
Dev Jayanth
3w Edited
Report this post
Built a Canary Deployment in Kubernetes! Key Points: Gradual Rollout: Only a small portion of traffic sees the new version initially. Monitoring: Metrics, logs, and user feedback are observed for errors or performance issues. Rollback Capability: If problems are detected, the update can be rolled back without affecting the majority of users. Safe Testing in Production: Real users test the new version without risking a full-scale failure. #DevOps #Kubernetes #CI_CD #LearningJourney
Like Comment
To view or add a comment, sign in
Prajol Annamudu
3w Edited
Report this post
Ever faced this in Kubernetes? 👇 Everything was working fine yesterday… Today, something feels off. No crashes. No alerts. But things are breaking. 👉 Requests failing 👉 Latency increasing 👉 Random issues showing up And the worst part? No one knows what changed. This is what I call ⚙️ Configuration Drift Small changes like: • Env variable updates • ConfigMap tweaks • Secret rotations • Partial deployments Individually harmless… But together → production issues 💬 Curious - how do you debug this today? Because most teams: → Compare configs manually → Check logs (no clear answer) → Spend hours guessing That’s exactly why I built KubeGraf: 👉 Tracks every config & deployment change 👉 Correlates it with system issues 👉 Pinpoints what changed & why it broke 👉 Suggests safe rollback or fix Instead of “what went wrong?” You get → “this change caused the issue” 💡 https://kubegraf.io #Kubernetes #DevOps #CloudNative #K8s #SRE #Debugging #Observability #IncidentResponse #RootCauseAnalysis #Microservices #KubeGraf #DevTools
Like Comment
To view or add a comment, sign in

6,130 followers

View Profile Follow

Don't couple testing to CI, run tests in the cluster

More from this author

AI takes over writing: OpenAI wrote this article about the benefits of our software in 3 seconds, and I have to agree

Explore content categories