Managing Network Transit Costs in the Cloud
Introduction
In the early days of cloud computing, we would examine cloud costs (often comparing to internal data center cost). In this simple IaaS world, we tended to focus on compute (e.g. EC2 and AWS) and storage (e.g. S3, EBS on AWS). However, network costs (ingress and egress) were often ignored, because a) that data was harder to model and b) figuring out how to allocate network costs was a huge ball of complexity that few organizations had precise line of sight to.
Now here in 2019, the cloud cost estimating world has become several degrees more complicated with the explosion in the of number of services offered by cloud providers. While services like AWS Lambda, AWS RedShift, AWS Greengrass, AWS Glue have made AWS an incredibly powerful platform, they have all made estimating costs more complex. While we’ve improved at estimating compute (even with its inherent elasticity), but for most enterprises, network estimating is still somewhat of a black art. Many development teams have a good idea of their compute and storage needs but are incapable of estimating their network needs.
In this article, we will cover the makeup of network costs in the cloud and techniques that help to manage (i.e. reduce) your spend on this very hard to predict cloud resource. Note, this article will largely use AWS terminology; however, most of the concepts described here are applicable to the other major cloud service providers (e.g. Microsoft Azure, Google Cloud Platform, Alibaba Cloud).
The Basics
So how do the cloud providers charge for network usage? There are a number of factors that determine your network costs:
- The first factor is “who is talking to who”:
- Network traffic between AWS and the internet (or your private network)
- Network traffic between different AWS services all within AWS (transferring data across AWS regions, or even availability zones, generally have a cost)
- Public/Elastic IP vs Private IP – generally, it costs more to transfer data between AWS services that are using public IP addresses
- The second factor is the direction data is moving:
- Inbound traffic (e.g. EC2 inbound traffic is free)
- Outbound traffic (e.g. EC2 outbound traffic is not)
- The third factor is the type of cloud service:
- Some products have network charges already built in (e.g. DNS (Route 53) or data streaming (Kinesis)
- Some products and services are network services themselves (e.g. Transfer Acceleration, Direct Connect)
- Many services require a bandwidth charge to get to the service (e.g. EC2, S3)
- The fourth factor is the region or area of the world your traffic is in:
- Different regions often have different prices for traffic reflecting the underlying costs of the global infrastructure (e.g. traffic out of EC2 is $0.05 in US-East but $0.19 in Sao Paolo)
Here’s a great diagram (courtesy of Cloudability) describing the charges of a simple application setup.
The Levers
Ok, so how can I reduce costs of my network spend? Here are several pointers to review with regard to your cloud solutions.
- Make sure you are watching and measuring your costs. Use AWS Billing and Cost Management dashboard as well as Cost Explorer. Consider 3rd party cost management tools such as Cloudability or Teevity. Transferize is a product targeted specifically at optimizing cloud transfer/network costs. All of these tools have modeling capabilities – use them!
- Set Billing Alarms (either in AWS or your 3rd party tool). Especially when you are just starting out or deploying brand new workloads.
- Watch out for usage of public/elastic IP addresses when you can use private IPs. This mistake is both common and costly.
- Leverage a content delivery network like AWS CloudFront or a 3rd party CDN like Akamai. If you have a lot of common traffic (e.g. webpages) these savings can add up quickly. Note that you are still charged to move the content from your VPC to the CDN.
- Stay within an availability zone (or a region) where you are not looking to improve high availability. Needless region to region costs are a killer.
- Leverage data compression. Whether it’s web pages or video files, compression makes a ton of sense where you pay by the byte.
- Look at your interface topology. With the popularity of hybrid and mult-cloud where solutions running on the cloud are hooked to solutions on premise (or on other clouds), map the interfaces. Clearly there are performance implications of the network topology as well, especially for real-time interfaces.
- Send deltas. No, not Delta Force. When replicating data, wherever possible, send only the changes in data, rather than forcing new, full copies.
- Leverage Direct Connect. By establishing a direct connection to the cloud, you can typically lower your costs. The pricing model is different (lower bandwidth charges but you pay for active ports).
- Reduce local processing. In certain circumstance where you need to download data locally to your workstation, you may be able to leverage AWS Workspaces as your workstation thereby keeping data in the cloud.
- Automated Multi-Region replications (ex: DynamoDB) – there are inter-region network transfer charges to consider. As part of a larger DR/BCM strategy, understand what data is truly required to be multi-region. In particular, abandoned test/dev environments that replicate needlessly should be terminated.
- Have an effective tagging strategy. With instances tagged and cost allocation tags enabled, network charges can be associated with instances and ultimately the accountable application/solution teams.
- Use a rich ecosystem of AWS Big Data products and services to move data processing to the Cloud to reduce the volume of transfers out of AWS
- Consider AWS Export or Snowball for very large data transfers (a.k.a. “never underestimate a bandwidth of FedEx”)
- Accountability. Move the accountability (and budget) for network charges to your application/solution teams (Actually, do this for all non-shared infrastructure costs). People act differently when it’s “their” money.
What about the other cloud providers?
As mentioned at the beginning, the product/service names are different, but almost all of these concepts and themes apply across all of the major cloud providers. There are clearly some differences (e.g. GCP doesn't charge for peering of VPCs in the same region), but the management strategies above largely apply.
Hope this helps.
Ken Corless Solid.
Does it make sense to hire actuary to do thorough cost calculation? Most people working on cloud are engineers, and they may not be very cost conscious. In some organizations, cloud engineers are not authorized to look into cost and spending information.
What was your experience with Transferize? Did it help identify network cost savings?
Thanks Ken. Great points. If I can add, may also leverage ElasticCache more wherever it's appropriate?