The Perfect Storm Of Spiralling Cloud Costs

The Perfect Storm Of Spiralling Cloud Costs

1. Before the Cloud

If you even had the opportunity to venture into a corporation’s own 'Pre-Cloud' datacenter it would seem overwhelming. These rooms sometimes as large as a football pitch, roared with the noise of a thousand servers. The reality was that these huge rooms were quite manageable. Everything was physical, visible and had an implicit lifecycle i.e. when a server reached the end of its life it would be pulled out to make way for something else. This meant costs were predictable and known up front.

2. The first phase shift to the Cloud wasn’t that big a change

It has been argued that the Cloud really is just the same servers but in someone else’s datacenter. In many cases during the first phase of Cloud adoption this was true. Many companies simply migrated software on servers in their datacenter to new servers in the 'Cloud'.

3. The second phase and no more servers!

This second phase of Cloud adoption involves using the Cloud as a platform of services rather than another way to host your old server software. When we use the Cloud as a platform of services, often referred to as ‘Platform as a Service’ or 'Serverless', we break the link between the functionality we consume and the server it sits on. Before the Cloud and even during the first phase of the Cloud, the performance of a database was significantly influenced by the specification of the server it sat on. The infrastructure and DBA teams would use their specialist knowledge to set up and manage multiple servers that could meet the performance and availability requirements. In the new Serverless Cloud world the application team need to do to provision the same database, is select the performance level on a dashboard and tick a box to ensure there is a redundant copy and a backup in the event of a disaster. All sorts of other physical network infrastructure and countless other services are now just listings on a screen. Software applications that would have once be installed on a handful of servers are now being written with a serverless architecture. This means components parts of an application right down to individual functions are separately written, deployed, managed and billed. Not running all this functionality on servers is a significant change. Without the servers to manage, the roles of the infrastructure teams who once oversaw them, and ensured efficiency, are significantly diminished.

4. The explosion in the number of components and complexity

In this Serverless second phase of Cloud adoption the delivery teams including app developers, business intelligence specialists and data scientists all separately provision and manage the Cloud resources they need. Every day you log into AWS, Azure or Google Cloud dashboards, new services are available. Sometimes the provisioning of a service automatically deploys tens of spuriously named expensive components. The challenge of keeping track of what’s running and relevant is significant. It can be almost impossible for the support teams or traditional Infrastructure personnel to have an appreciation of what is going on.

5.  New technologies now available for managing this complexity are not the answer

New methodologies around 'DevOps' and 'Infrastructure as Code' do aid in managing this new complexity but they are not practical to use for everything that’s deployed. Proof of concepts and other non-production components are often manually deployed and easily forgotten. Also, sometimes systems just stop being used because the business may have found a better way to do things or the requirement no longer exists. The business users may not tell IT. Also, the people who knew the DevOps configuration, who could make sense of it all, may have moved on. If just one new person deploys a component directly outside the DevOps process the whole configuration is now stale and you are back to relying of individuals for knowledge of every aspect. The possibilities of situations where you have countless redundant things running in dark corners of your Cloud subscription have increased significantly.

When these zombie workloads existed in the old world, ultimately the lifecycle of the servers they were running on would mean they would not be zombies for ever. The other reason zombies were not such a problem in the pre the Cloud era was because the infrastructure they were running on would generally already have been paid for.

A perfect storm

6. The Clouds second phase of adoption

So what makes this a perfect storm?

  • The competitive advantage of more and more businesses is derived from its IT as more companies become some sort of software company. Coupled with this, new IT initiatives need no longer navigate the traditional onboarding process or experience prohibitive upfront costs. For both these reasons we are seeing a ramp up of business initiatives requiring IT.
  • At the same time our move to the second phase of Cloud adoption, where we manage components and services not servers, is leading to an explosion of disparate moving parts.
  • The move away from managing servers means the infrastructure personnel who were traditional ‘gate guardians’ who could manage long term efficiency are no longer involved.
  • Due to the number of components and complexity there are many more possibilities for them to compound an ever-increasing expense of zombie workloads that must be paid for their entire lives.
  • These are often just line items on a bill with finance and the business have to accept and allocate
  • Front line delivery teams who now create all this Cloud infrastructure as part of project delivery are focused on meeting their next delivery deadline. They generally have no budget for ensuring the long-term efficiency of previous projects.
  • The support teams who now inherit the management of the complex solution will often not have the knowledge of the multitude of components and therefore confidence to tweak performance plans to maintain long term efficiency.
  • The first course of action during a production application issue is often to dramatically increase the performance level of the failing component. After a fix, Change Management and Service delivery teams can be reluctant to revert the performance level and the expense is no longer a project budget issue!

7.  The second phase of Cloud adoption is both an opportunity and a risk to be managed

Well, this all sounds very gloomy! So, should we retrench from the Cloud and return to the comfort of our own racks of servers and reinstate the role of the traditional infrastructure gate guardians?

Absolutely not!

The speed with which requests to IT from the business can now be met represents a real uplift in productivity and therefore business opportunity and competitive advantage. We're not just talking about new application requests, we're talking about throwaway pilot initiatives, big data analytics, IoT platforms to name but a few. But 'buyer beware', we need to recognise the cultural shift away from the simple ‘server’ and the implicit support that came with it and make sure that long term oversight, visibility and control of the multitude of new Cloud resources are front and center of all we do in the new Cloud world.


To view or add a comment, sign in

More articles by Nigel Rees

  • The Clouds Perfect Storm & Solutions!

    1. Before the Cloud If you even had the opportunity to venture into a corporation’s own 'Pre-Cloud' datacentre it would…

  • The 5 Common Pitfalls of Software Projects

    There is no silver bullet answer for how a business should construct or engage a team to successfully deliver a…

    3 Comments

Explore content categories