The COGS Problem
This post is for engineering leaders specifically, but anybody looking for tactics and strategies to reduce their cloud spend should be able to pull out some nuggets from here.
Run an engineering department for a while, and it’s a bit astounding what becomes interesting. While I was once fascinated by new technologies and software, eventually my interests moved more into making sure the technology we were building was meshing and fitting with the rest of the goals of the company. I still love to hack on tech and still write lots of POCs or fun side projects, but for the day job, I spend the time on the intersection of business and technology – and it can actually be quite enjoyable.
People outside of engineering generally care about your technology implementation to a very small degree – like almost none. Customers don’t buy your programming language choice, your platform choice or even your org chart make up. In the end somebody is measuring the return on investment (ROI) for engineering, and ideally the first person doing that is the engineering leader.
A key metric that contributes to ROI is Cost of Goods Sold (COGS). COGS is the cost component of Gross Margin (GM), which also can be manipulated by pricing/packaging and even by the way revenue is recognized. So, while total GM is a collaboration between Finance and Engineering, COGS can usually stay a little more within just the realm of engineering. However, there are certainly some rules around support staff costs, tooling budgets and some miscellaneous items that need to be accounted for (and will not be covered here).
For an online business, the biggest component of COGS is likely cloud costs. At AWS RE:invent this year, that was a hot topic. The market was certainly beginning to change and the methods of measuring and benchmarking companies (particularly those in the tech sector) transitioned from growth at all costs to efficient growth – meaning costs need to be in control. The expo floor had around a dozen vendors who promised to save me huge amounts on cloud costs.
Before you jump into spending money on cost control, take a step back. It’s not often that spending money to save money is the best first step. Here’s some thoughts to effectively review and improve your cloud expenditures.
Stop Dumb Spending
First, stop with dumb spending. This may seem obvious, but for many organizations I’ve talked with, it’s just not being done. It can come off like “eat your vegetables,” but in reality you may have some really easy things to clean up to save a bunch. Even if it’s not a too sizable of savings, good hygiene in these areas will yield improve discipline for further steps. Dumb spending includes things like
Yeah, that stuff seems easy. Cool. Just do it. Maybe you’ll see some impact.
Unit Costs
Next is where the real work starts. You need to know what your unit cost is. It could be what it costs to host a customer, what it costs per mobile app user, per transaction on the platform, per minute spent on the software, etc. I can’t tell you exactly what your unit cost is, but you need to figure it out. At my last role, we knew the unit cost for a credit, a job, a build, and several other dimensions. At another company, the VP of Product knew the cost per agent deployed and knew that if sales sold something below that cost, there was no way to make up that delta – even at high volume.
To illustrate this a bit:
Let’s say an agent costs $76 a year for a customer. This would likely be on average and some may be higher and some could be lower depending on the region of the world it’s installed in, the amount of data/usage going through the agent, the complexity of usage, or other factors. But, say that $76 is basically agreed upon by finance, accounting, engineering and product. Cool. Now if you have Gross Margin goals that you’ll get benchmarked against (and your company does have these), you can know how you stack up.
If you’re out of line with your GM goals, you have two options.
Recommended by LinkedIn
In reality, you’ll probably end up doing a combination of both.
Moving from $76 an agent to $71 an agent, may be pretty simple. Things like the ideas outlined in Stop Dumb Spending could plausibly get you that much savings. Maybe you can even get to $70 or $69 dollars annually for that agent cost.
Now, if your finance team says the goal really needs to be about $55 annually per agent, you have some serious work to do.
When working on this unit cost, you must know how you scale. Do you get cheaper when you add more units of work? Is it cheaper per unit to run 300 than 50? If so, getting more customers is a win-win move and should be prioritized. If you scale your costs basically linearly, then you have no real economies of scale and need to work on the individual unit or change the design/architecture of the system to gain economies of scale.
Unit cost should be measured and tracked over time. This will show you everything you need moving forward in terms of progress on the savings mission. Also, don't forget to account for static costs in your unit cost calculation (like that shared k8s cluster, shared DBs, backups, baseline network costs, and more).
Infrastructure Optimization
Once you have your unit costs, look at infrastructure first. Often it’s faster to move/change infrastructure than redesign an entire application (or dozens/hundreds of microservices). Things that can be advantageous for infrastructure:
Sidebar: Don't go multi-cloud for a savings play. This has the overhead of multiple clouds, which includes security, technology understanding, and to make yourself portable, you're often leaving several of the best and most efficient cloud resources off the table. If you're in multiple clouds, consolidating down to one should allow some leverage for a steeper discount. Cloud providers usually will offer some competitive or migrations credits if the workloads are sizable.
Architectural Changes
You can absolutely make architecture changes earlier or in parallel to any other efforts. Depending on your product maturity and market-fit, this could be a bigger or smaller win. Often these get weighed against a product roadmap, so the trade-offs can be more complex. Major things I have seen cause extra spend:
Conclusion
Tools exist to help with cloud cost management. They often help first with things like buying reserved instances, savings plans, or committed use discounts. Those are great, you should be doing that – but that’s a tip of the iceberg in terms of a real understanding of costs of your cloud applications. Some will dig in and find buckets without retention policies or oversized instances. Some do this with awesome tech like eBPF under the hood and even profile the application specific calls of a JVM or python. Other providers have secondary markets for reserved instances and capacity.
The mindset of COGS is important for software leaders. It’s also great for engineers. If you can show a saving or improvement in 24-48 hours, that’s amazing feedback. Developers love feedback. Show them the work they’ve done to retire a service or optimize a query matters. They’ll do more of it for that dopamine hit of knowing the work had a measurable impact.
As a leader, understand your costs and how it flows into the rest of the business for benchmarks and measurements. The other members of your executive team will thank you.
Good article, however one point I would contend with... "Don't go multicloud" in fact, if you are multicloud you have a lot better negotiating power with your cloud provider. You also have options and resiliency in your architecture. Finally, by having multicloud capabilities, you are not tied to services from a single cloud provider. Microsoft is raising prices 10% on Azure in EU (generally speaking) on April 1st. If you are on Microsoft only, you are eating a 10% cost increase. If you are multicloud you can shift resources to avoid the possible impact. The other angle aside from the RIs is how you manage your commit contracts with cloud providers. I am shocked when I see users spending significant amount of money with a provider that has either negotiated a poor or no commit contact and doesn't leverage savings plans, CUDs. Nice read, thanks for sharing!
This is amazing - spot on Michael Stahnke. One thing that I learned in the last year - is that if you can identify and control a COGS number - make it an SLO for a product engineering team - balancing product features and "cloud cost debt paydown" to keep the unit economics within the SLO. Great way to keep engineering/product motivated on costs