The Patch Problem
A number of organisations have recently asked for help with patching, specifically in response to vulnerabilities. They've found patching to be not just difficult, but very, very difficult - in fact a wicked problem - and have asked the industry for help in solving it.
Vulnerability scanning itself is an old technology, one that went through the hype cycle. When we first used it, we were very excited to finally be able to enumerate the vulnerabilities in the attack surface. Not so our colleagues in IT Ops, who's jobs suddenly went from fixing everything that went wrong - to fixing everything that went wrong and also remediating the millions of vulnerability instances that were suddenly enumerated. They viewed vulnerability scanning as a plague on all of mankind. The problem of vulnerabilities felt too big to tackle - hundreds of thousands of machines each with thousands of different vulnerabilities, different software stacks and each critical to some portion of the business. That becomes millions of unique regression tests, one for each combination of software stack and patch set, to find the correct order to apply them in order to both patch the vulnerabilities while maintaining a stable service. It feels like drinking from a firehose.
But as organisation after organisation has discovered, the fact that a problem feels too big to tackle isn't a good reason to do nothing. Maersk, the NHS, WPP, Rosneft, Deutche Post, Merck and Saint Gobain can all attest to varying levels of pain from the lack of patching vulnerable portions of the IT estate. The costs of failing to patch are in the hundreds of millions of pounds, just across those companies alone.
So, what to do?
First, don't shoot the messenger. Keep the scanning. It at least lets you know the shape and size of the problem, and you can begin tracking loss and outbreaks back to vulnerabilities you knew about, thus making the case for greater budget and investment in currency in the IT Ops world. Vulnerability management involves scanning the estate, discovering new boxes and images, identifying them correctly, placing them in the correct scan schedules, determining which scans to apply to those schedules, scanning them as often as is technically feasible and then informing the platform teams just how totally under the hammer they really are. Technology such as Tenable is great for performing the scan, but how to go about patching?
Second is to prioritise patches by risk to the organisation, so you're taking care of the most important patches first. There's new modules from places such as ServiceNow that can help to organise the information, which is a start. In the ServiceNow SecOps offering, there sits a handy Vulnerability Response application which will take the full list of vulnerabilities from a scanner such as Tenable and allow your organisation to begin to make a stab at prioritising. You can turn the handle on a vulnerability scoring system and, provided you have good criticality information from the business, you can start to prioritise which patches will give you the best bang for your buck.
Now at least if you're only going to patch a tiny portion of your estate and leave the rest vulnerable, you can take care of the biggest risks first.
Third, start to look at mechanisms for being able to connect patching with your automation and orchestration tools. Here at Adarma, we're fans of using generic automation execution engines to do this, such as Tachyon.
Finally, rethink your lean Pre-Prod strategy. Lean pre-production is the idea that, since changes in production are normally relatively rare, you don't need to have a full copy of each production system - or even a scaled down copy of one (frequently, disaster recovery and business continuity overprovisioning is scaled down for the pre-prod system). Instead, some enterprises have implemented a policy that, when you need your pre-prod environment, you spin one up in VM and work on it there. This probably isn't going to be wide enough to regression test the millions of patch and application stack combinations you have to test to be sure there won't be production outages from all these patches you're about to deploy.
One thing you can be sure of, though, is that if you don't measure your vulnerabilities and don't apply security patches, you'll end up in the news.
Great original article and comment Carric. Right there with you that it’s a bit nuts how little has changed since 1998 when I started at ISS. Tenable is my third stop in the VM space in my career and we are taking a run at addressing some of these fundamental issues. We are applying a new degree of data sciences in our stuff and have an integration/strategic partnership with ServiceNow to help bridge the gap between Security and IT and make the whole process smarter and more efficient. No magic bullet, but honestly pretty stoked about what we can accomplish together. I am also seeing a resurgence of interest in proactively addressing issues vs just trying to detect bad guys. Interesting times as always...
Hey Nathan! It's a great post on some of the more subtle realities that make us [old guys] start to ask fundamental questions like: "Wait - given the magnitude of this issue and how little progress we have made in 20 years, does anyone really care about solving this?!". But seriously - I would add another facet to this discussion based on some projects I have been on. I had a client years ago with hundreds of thousands of reported vulnerabilities. They seemed to have mastered patching the OS, but al the installed apps were understandably A MESS. This was 2009, but I remember something like 328k Highs, 441k Mediums, and 1M Lows. It was truly daunting, and this was something like 16k hosts, so not even a really big environment. So - I grabbed all the data in CSV and started trying to put a fresh look on this problem so we could define a plan of action that might actually yield more result than: "DONE! You focused all your effort on patching 1.6m issues, but unfortunately, while you were patching, more vulnerabilities were announced, so you have 1.4m new action items to start on right away!". What I noticed was huge swaths of issues linking back to a fundamental root cause. In fact, I was almost stunned to realize 99% of the problem distilled to 6 applications. Altiris, IE, Office, Adobe, and two others that have simply faded from memory. Here's the thing: - licensing had expired for Altiris 2 years prior. Recommendation: Uh... don't patch that. Rip it out! 120k issues gone... How's that not a good thing!? - various versions of IE and MS Office, Adobe et al but really - only the latest revision + secure configuration will do, no? Why not focus time and effort on creating SCCM packages of managed "latest and greatest" (does anyone remember Seagate's Desktop Managment Suite? We devised a staging process in 1997 for a company that would blow down the base OS in 2 minutes, then you could "tick box select" the apps you wanted, and we would manage all that centrally ongoing. EZ PZ with an up-front investment!!), and just rip out every other non-sanctioned version/configuration? Outcomes: Ripping out Altiris not only eliminated all associated high risks, but the software-associated Meds and Lows. Any pen tester can tell you we often find scenarios where we can combine 2 Meds (or even Lows) into a CRITICAL (SNMP read + telnet enabled got me QSECOFFR one time [brownie points to anyone who remembers this platform!! It was a big deal]). With a single root cause action, you address attack surface holistically and permantently. I'm not saying I am selling any magic formula, but it's something else to consider when staring blankly at the 'insurmountable mountain of patches' issue. If you think a little outside the box on WHY you have millions of little problems, you might stumble on a root cause fix and address whole swaths of pain in a much more effective way that doesn't leave you with the same brute force 800lb deadlift [again] next year...
comparing vulnerabillity patching with gutting a fish is hilareous
Some useful advice in here, but it misses the fundamental point that regular, scheduled disruption (from regular, scheduled patching) is much cheaper than irregular, unscheduled disruption (from irregular, panic-led patching or worse, a security event).
Sage advice!