The Perfect Storm Of Spiralling Cloud Costs
1. Before the Cloud
If you even had the opportunity to venture into a corporation’s own 'Pre-Cloud' datacenter it would seem overwhelming. These rooms sometimes as large as a football pitch, roared with the noise of a thousand servers. The reality was that these huge rooms were quite manageable. Everything was physical, visible and had an implicit lifecycle i.e. when a server reached the end of its life it would be pulled out to make way for something else. This meant costs were predictable and known up front.
2. The first phase shift to the Cloud wasn’t that big a change
It has been argued that the Cloud really is just the same servers but in someone else’s datacenter. In many cases during the first phase of Cloud adoption this was true. Many companies simply migrated software on servers in their datacenter to new servers in the 'Cloud'.
3. The second phase and no more servers!
This second phase of Cloud adoption involves using the Cloud as a platform of services rather than another way to host your old server software. When we use the Cloud as a platform of services, often referred to as ‘Platform as a Service’ or 'Serverless', we break the link between the functionality we consume and the server it sits on. Before the Cloud and even during the first phase of the Cloud, the performance of a database was significantly influenced by the specification of the server it sat on. The infrastructure and DBA teams would use their specialist knowledge to set up and manage multiple servers that could meet the performance and availability requirements. In the new Serverless Cloud world the application team need to do to provision the same database, is select the performance level on a dashboard and tick a box to ensure there is a redundant copy and a backup in the event of a disaster. All sorts of other physical network infrastructure and countless other services are now just listings on a screen. Software applications that would have once be installed on a handful of servers are now being written with a serverless architecture. This means components parts of an application right down to individual functions are separately written, deployed, managed and billed. Not running all this functionality on servers is a significant change. Without the servers to manage, the roles of the infrastructure teams who once oversaw them, and ensured efficiency, are significantly diminished.
4. The explosion in the number of components and complexity
In this Serverless second phase of Cloud adoption the delivery teams including app developers, business intelligence specialists and data scientists all separately provision and manage the Cloud resources they need. Every day you log into AWS, Azure or Google Cloud dashboards, new services are available. Sometimes the provisioning of a service automatically deploys tens of spuriously named expensive components. The challenge of keeping track of what’s running and relevant is significant. It can be almost impossible for the support teams or traditional Infrastructure personnel to have an appreciation of what is going on.
5. New technologies now available for managing this complexity are not the answer
New methodologies around 'DevOps' and 'Infrastructure as Code' do aid in managing this new complexity but they are not practical to use for everything that’s deployed. Proof of concepts and other non-production components are often manually deployed and easily forgotten. Also, sometimes systems just stop being used because the business may have found a better way to do things or the requirement no longer exists. The business users may not tell IT. Also, the people who knew the DevOps configuration, who could make sense of it all, may have moved on. If just one new person deploys a component directly outside the DevOps process the whole configuration is now stale and you are back to relying of individuals for knowledge of every aspect. The possibilities of situations where you have countless redundant things running in dark corners of your Cloud subscription have increased significantly.
When these zombie workloads existed in the old world, ultimately the lifecycle of the servers they were running on would mean they would not be zombies for ever. The other reason zombies were not such a problem in the pre the Cloud era was because the infrastructure they were running on would generally already have been paid for.
6. The Clouds second phase of adoption
So what makes this a perfect storm?
7. The second phase of Cloud adoption is both an opportunity and a risk to be managed
Well, this all sounds very gloomy! So, should we retrench from the Cloud and return to the comfort of our own racks of servers and reinstate the role of the traditional infrastructure gate guardians?
Absolutely not!
The speed with which requests to IT from the business can now be met represents a real uplift in productivity and therefore business opportunity and competitive advantage. We're not just talking about new application requests, we're talking about throwaway pilot initiatives, big data analytics, IoT platforms to name but a few. But 'buyer beware', we need to recognise the cultural shift away from the simple ‘server’ and the implicit support that came with it and make sure that long term oversight, visibility and control of the multitude of new Cloud resources are front and center of all we do in the new Cloud world.