About the tree and the forest
In this article I will share some of the observations we made during the the analyses of many AWS bill files including common patterns that people should be aware of and consider when looking at their compute, storage, data center and network spend.
I've always been very much intrigued by the amount of businesses that have sprouted making a living of a singular problem in a singular market served by a singular provider. You guessed it: the myriad of tools that have emerged promising to help you understand and optimize your AWS utilization and consequent bill. It is amazing in particular when you realize AWS is just a small part of the overall spend in compute, storage, data center and network. It is our opinion that 95% of this expenditure still occurs in-house, in data centers, and other managed services. And for valid reasons.
Let me give you an example: a while back we helped a consultant who had received the data of a complete compute and storage infrastructure of an important customer. This consultant was asked to provide an analysis and present possible optimization opportunities. However he was also asked to only look at the AWS portion of the infrastructure. We helped him use the Burstorm platform to model what a reduction in compute sizing would bring in possible savings.
Here are some of the findings generalized and augmented with additional observations from what we have seen during other similar analyses.
- Many bills show 80% or more of the infrastructure deployed in one region. This is a significant Business Continuity risk. That this is real was unfortunately confirmed by all the service interruptions caused by the latest AWS S3 issue a few weeks ago.
- 25% or more resides in what we at Burstorm call "Misc Cost". These are cost like usage penalties, 3rd party service cost, support cost, reserved instance one time fees etc. Anything higher than 20% raises a yellow flag. Changing the business terms is often the most effective way to reduce cost here.
- We find obsolete instance types still being grandfathered-in which need to be replaced at some point. Most often people don't realize those are at risk of becoming unsupported or unnecessarily expensive.
- Optimizing compute utilization has often a lot less of an impact than anticipated. In this particular case it was between 5% and 10% of the 20% that compute represented in this particular AWS bill.
- Large amounts of storage attached to single instances for no apparent reason. Although compute usage goes up and down over time, storage tends to keep going up. Storage optimization has great potential yet is often not considered due to the associated complexities. Invest in creating smart retention (or more important, deletion) policies.
- When we model against other deployment methods (private, hybrid and other providers) there are almost always significant parts which can be done at much lower cost (often ~50%). Although much harder to attain, one should keep an eye on these and consider building apps in an infrastructure agnostic way to be able to capture these opportunities easier in the future.
As you can see just focusing on compute has only a limited impact. It becomes even more obvious when you put it in perspective of the rest. In this particular case the total compute and storage represented ~7000 severs. The AWS portion of that was ~150 servers or 2.14%. Of that 2.14% the compute part represented 20% or 0.43% of total. In those circumstances a 10% optimization represents a minuscule 0.043% of total cost.
For me, this not so uncommon example illustrates clearly how people can get all caught up in the details and forget to see the forest. If you want to have a larger impact than 0.043% you need a platform that not only can model AWS but also every other major cloud provider while allowing you to consider your in-house or other private infrastructure solutions as well.
Check out how you can leverage the Burstorm platform out here.