Reducing Software Deployment Risk

Carter Edmonds

Published Aug 12, 2024

The cult-classic Galaxy Quest featured the “Omega-13 Device.” You could jump back in time 13 seconds to correct a mistake you made. It was essentially a big “undo” button. Imagine that. If you could undo any mistake, you would drive the cost of error down to zero. You would be free to innovate and you could make mistakes. But the mistakes wouldn't hurt you.

The global server outages of the last few weeks have me thinking more generally about the commercial risks of software updates. What happens if an update goes wrong? Blaming the vendor doesn’t heal your customers. Just ask anyone stuck in an airport while the servers were down.

Mistakes happen, so we need to find a way to drive down the cost of errant software updates. One technique that I’m fond of is “immutable deployment.”

In days of yore, you had a server. You installed software on it. You updated it. You patched it. You maintained it. And if something broke, you faced downtime. It was like a pet. If it became ill, you nursed it back to health. To deploy software updates, you would schedule downtime, say a low-usage time like Sunday morning. If the update worked, all was well. Otherwise, you wouldn’t see daylight until it was fixed.

The traditional fallback was “disaster recovery” and that toolbox offers many ways to address this. If you’ve created and tested your backup processes, you install a fresh server from the most recent backup. More sophisticated approaches include point-in-time-recovery where you can roll back to the moment before things went haywire. Those are good tools, but what if we could avoid disaster recovery altogether?

Virtualization and containerization and the cloud have opened more options. Immutable deployment is my favorite. You never patch or update a server. Instead, you build a new image containing the software updates. You put that image through your automated testing suite. If it passes, you spin up new servers using the new image and then retire the older servers. If a bug escapes and you have to roll back to the previous release, you simply spin up new servers using the previous image and retire the servers running the errant update. All of this happens with the flick of a switch or through automation.

Contrast this with disaster recovery and restoring from backup. As we saw with recent events, it can take days of manual reconfiguration to bring systems back to their previous state.

Recommended by LinkedIn

Peeper – Eltex software monitoring expert

Eltex Company 1 month ago

CrowdStrike outage: The Unofficial Retrospective

Igor Malovitsa 1 year ago

SysAdmin Diaries - Fleet policies for patch management

Harrison Ravazzolo 1 year ago

Immutable deployment requires a couple of things. You need a virtualized or containerized workload. And you have to separate your compute from your data. This is a key area of technical debt that prevents some workloads from employing it. Of course, building those image recipes can be a lot of work, but you have to do some version of that anyway. You workload needs a recipe. It can’t simply be the accumulation of all updates and changes ever applied to it.

Immutable deployment works especially well when combined with continuous integration continuous deployment (CI/CD), but CI/CD is not a prerequisite. The trick is that you install your latest software and your latest patches to a reusable image, not the live machine.

People move to the cloud to be more resilient and more nimble. Agility requires reducing the cost of errors and the penalty for choices that don't work out. Otherwise, you're just treating the cloud like colocation space…like someone else's computer.

Creating immutable deployments takes some work. But the piece of mind it brings makes it all worthwhile.

-- Carter

Dan Broadway 1y

Interesting and timely article Carter, thanks! I agree it's a good time for some folks to reevaluate the perceived costs and risk mitigation of immutable deployments. Ironically, from my experiences on a Windows dev-sec-ops containerization project, CloudStrike was the easiest of the handful of third party security tools to image.

1 Reaction

To view or add a comment, sign in

Reducing Software Deployment Risk

Carter Edmonds

Recommended by LinkedIn

More articles by Carter Edmonds

Others also viewed

Effective Patch Management

Building Software Resilience: A Comprehensive Approach

Securing the Software Supply Chain: Strategies for Mitigating Risks of Unmaintained Components and Licensing Issues

How to Handle Legacy Applications: Support and Maintenance Strategies

Nagios for Beginners: How to Set Up Infrastructure Monitoring in Under 30 Minutes

Day 85 - Best Tools I Use Daily as an Infra Admin

Navigating the Challenges of Software Maintenance and Upgrades

Understanding Software Bill of Materials (SBOM): Unraveling the Blueprint of Digital Assets

Explaining different infrastructure management methodologies

Patch Management

Explore content categories

Recommended by LinkedIn

More articles by Carter Edmonds

From Hardware Approvals to Budget Guardrails: Rethinking Risk Control in the Cloud

The Hidden Technical Debt Crippling Your Cloud Transformation

Agility and the Cloud

Why CTOs Feel Failed by Cloud Adoption

Agility and the Cloud

Others also viewed

Effective Patch Management

Building Software Resilience: A Comprehensive Approach

Securing the Software Supply Chain: Strategies for Mitigating Risks of Unmaintained Components and Licensing Issues

How to Handle Legacy Applications: Support and Maintenance Strategies

Nagios for Beginners: How to Set Up Infrastructure Monitoring in Under 30 Minutes

Day 85 - Best Tools I Use Daily as an Infra Admin

Navigating the Challenges of Software Maintenance and Upgrades

Understanding Software Bill of Materials (SBOM): Unraveling the Blueprint of Digital Assets

Explaining different infrastructure management methodologies

Patch Management

Explore content categories