When the Cloud Shakes

Seth Samuel

Published Oct 20, 2025

Today, the internet briefly reminded us how fragile it really is. A major global outage brought down platforms like Docker, Canva, Perplexity, and several others — sending ripple effects across CI/CD pipelines, design tools, and AI platforms worldwide.

For DevOps teams, it wasn’t just another “downtime day.” It was a real-time lesson on dependency risk, resilience, and what “high availability” truly means in 2025.

The Hidden Dependency Problem

Most teams didn’t directly “use” Docker Hub in production — yet their builds failed. Why? Because every image pull, build, or deployment in their CI/CD pipeline depended on Docker’s registry being online.

This event exposed a reality many organizations overlook: Modern infrastructure isn’t fragile because of bad engineering — it’s fragile because we’ve built everything on shared third-party dependencies.

Even small services can grind to a halt if one critical external system — a registry, DNS provider, or authentication API — goes offline.

Multi-Cloud Sounds Nice. But It’s Rarely Practical.

When outages like this happen, “multi-cloud” suddenly trends again — as if running workloads across AWS, GCP, and Azure is a silver bullet.

But in practice? For small services or single Kubernetes clusters, multi-cloud deployments are usually more pain than protection.

Here’s why:

You’ll double (or triple) your infrastructure costs and ops overhead.
Each cloud has different networking, IAM, and monitoring stacks.
CI/CD pipelines, secrets, and observability become harder to manage consistently.

Unless you’re a global-scale enterprise with compliance or data sovereignty constraints, multi-cloud often adds complexity without improving uptime proportionally.

Multi-Region Deployment Is the Sweet Spot

Instead of going multi-cloud, go multi-region within a single cloud. It’s a more practical and cost-effective way to achieve resilience and reduce blast radius.

For example:

Deploy your Kubernetes clusters in two regions (say, us-east1 and europe-west2).
Keep your container registry replicated between regions.
Use managed database replicas or failover instances in another region.
Set up DNS failover so traffic automatically routes to the healthy region.

Recommended by LinkedIn

Achieving Peak Performance and Security in Azure: Best…

Nikila Fernando 3 years ago

Modernising Your Azure Environment Post-Migration

Max Chamberlain 1 year ago

Provision an AKS Cluster with Crossplane: A…

Young Gyu Kim 4 weeks ago

This approach keeps your stack homogeneous and manageable while still protecting you from regional or zonal outages.

Multi-Cloud Readiness Still Matters

Being ready for multi-cloud is not the same as running multi-cloud.

You should design your systems so they can move easily — portability over placement.

That means:

Use Terraform or Pulumi for infrastructure as code.
Package apps in Docker containers or Helm charts for easy redeployment.
Avoid cloud-specific services unless absolutely necessary.
Maintain backups and container image mirrors in an alternative provider (e.g., backup your images to GCR or an S3-compatible registry).

That’s how you stay ready for tomorrow’s unknowns — without tripling your operational burden today.

The DevOps Takeaway

Today’s outage wasn’t just Docker’s problem — it was a wake-up call for everyone building in the cloud.

Resilience isn’t about avoiding failure. It’s about designing for continuity when failure happens.

As DevOps engineers, our goal shouldn’t be to eliminate downtime entirely — that’s impossible. Our goal should be to limit the blast radius, shorten recovery time, and decouple dependencies.

So next time someone says “let’s go multi-cloud,” ask instead:

“How well are we doing in just one cloud?”

Because if a single regional outage can still take us down, multi-cloud won’t save us — but multi-region resilience and multi-cloud readiness will.

Tarak . 6mo

Just checked your article, really sharp take on the real issue behind the Docker outage. I liked how you framed resilience not as avoiding failure but designing through it. It’s a tough truth that so many CI/CD pipelines still have hidden external dependencies, Docker Hub, GitHub Actions runners, even artifact registries that become single points of failure we don’t fully own. What’s been the most effective mitigation strategy for these dependency risks? Local caching, redundancy across registries, or building internal mirrors?

1 Reaction

Mandela Inegbedion 6mo

Well said Sir. The outage really highlighted how much we depend on unseen services. True resilience means expecting failures and designing systems that stay functional when they happen. Looking forward to reading your article.

When the Cloud Shakes

Seth Samuel

The Hidden Dependency Problem

Multi-Cloud Sounds Nice. But It’s Rarely Practical.

Multi-Region Deployment Is the Sweet Spot

Recommended by LinkedIn

Multi-Cloud Readiness Still Matters

The DevOps Takeaway

More articles by Seth Samuel

Others also viewed

The Future of Workloads: Moving from VMs to Cloud-Native

Cloudy Keynotes, Clear Context

Did you know why Kubernetes has multiple Service types—and why most people misuse them?

The Three Silent Killers of Cloud ROI: How I Found $1M in Wasted AWS Spend for a Fortune 500 Manufacturer

Taking Anthos and AlloyDB for a Multicloud ride

Sneak Peak - Mastering the Basics: Terraform and Infrastructure as Code in Azure

Building in the Cloud is Easy. Building it Right is Rare

A Terrible Terraform Pattern

Realizing Telco PaaS with Akraino/ICN

Why Kubernetes Is Overkill for Small Teams

IT Infrastructure Management in a Multi-Cloud World

Kubernetes and Application Reliability Myths

Using Kubernetes to Build Resilient Digital Solutions

Explore content categories

The Hidden Dependency Problem

Multi-Cloud Sounds Nice. But It’s Rarely Practical.

Multi-Region Deployment Is the Sweet Spot

Recommended by LinkedIn

Multi-Cloud Readiness Still Matters

The DevOps Takeaway

More articles by Seth Samuel

Working as a DevOps Engineer: The Swiss Knife Approach

What is an education?

Answers to FAQs for Taxi-hailing USSD App

Others also viewed

The Future of Workloads: Moving from VMs to Cloud-Native

Cloudy Keynotes, Clear Context

Did you know why Kubernetes has multiple Service types—and why most people misuse them?

The Three Silent Killers of Cloud ROI: How I Found $1M in Wasted AWS Spend for a Fortune 500 Manufacturer

Taking Anthos and AlloyDB for a Multicloud ride

Sneak Peak - Mastering the Basics: Terraform and Infrastructure as Code in Azure

Building in the Cloud is Easy. Building it Right is Rare

A Terrible Terraform Pattern

Realizing Telco PaaS with Akraino/ICN

Similar topics

Why Kubernetes Is Overkill for Small Teams

IT Infrastructure Management in a Multi-Cloud World

Kubernetes and Application Reliability Myths

Using Kubernetes to Build Resilient Digital Solutions

Explore content categories