Meltdown/Spectre – the security/performance tradeoff
It is axiomatic in the InfoSec realm that keeping patch levels current is one of the most critical steps for any IT shop. In the performance realm, it is a rule of thumb (Rule of Thumb = Axiomatic – m) that any updates that cause significant performance degradation should be thoroughly investigated, evaluated and possibly ameliorated before deployment into production. The Meltdown/Spectre security flaws now present a clear case where the security risk may be in direct opposition to the performance requirements.
In my previous post I spoke of the necessity of quantifying the performance impact. This must be done for ALL enterprise applications. The risk of not patching systems must also be quantified. In applications having a multi-level architecture, both the risk and the performance effects must be quantified at each level. (as an aside, I’d do this one level at a time, starting from the furthest level away from the workload source – each level separately and then add levels from the far-end back to the source.) The decision point between that which is axiomatic and that which is RoT-ic can only be made at the business owner level. I do not envy them the choice.
For my personal computing environment, where the risk is near 0 and the probability of a performance hit is near 1, the choice seems clear. In a mission critical business application, the decision will be more difficult. Even in an environment where the security risk is low and the probability of missed SLAs is high, this won’t be an easy choice. Quantifying both risk and performance will at least provide a sound basis for decision making.
Being mostly a Performance Engineer, I inhabit a world of tradeoffs so let me propose a few Rules of Thumb:
· If the performance degradation is low (< 5%) or otherwise acceptable – patch.
· If the performance hit is > 10% && < 25% and the risk is low to med then
o If you will not miss SLAs – patch and begin tuning to gain back what you lost.
o If you will miss SLAs – don’t patch and start tuning efforts so you can patch.
· If the performance hit is > 25% and the risk is above medium and missing SLAs is not a viable option then
o Delay patching
o Begin vigorous tuning efforts across the architecture to decrease execution/response times.
o Investigate alternative deployments to non effected chipset architectures.
o Manage the risk while increasing your IP and ID capabilities to shrink the risk window until such time as the patch will not torpedo your business.