AWS at Scale - Hub model and Control Tower
What do you do when you have a single AWS Account with a small number of workloads? You might just manage it in a simple way with very little structure to how it is managed. Because if it is only 1 Account with 1 VPC and some infrastructure it might not matter as much?
But what is you have 100 or 1000 accounts?
At Scale you need then to have a Landing zone.
A Landing zone is defined as a centralised Governance model that acts as an overlay over a series of AWS Accounts and infrastructure. It puts a Governance stamp over things that traditionally operated separately
Datacom have been in the Landing zone game in AWS for about 5 years (for Context, AWS themselves only released their first landing zone product in 2018 followed by AWS Control Tower in 2019) so we have a strong mindset for management at scale and in our recent forays we have worked closely with AWS in recent years to help shape improvements in landing zones in general.
How did we start thinking in this way?
We started articulating what would make it possible to manage 100 or 1000 AWS accounts with a similar level of Human resources and interaction as it would take to manage 10-20. And ideas formed:
Funnily enough, fast forward to 2021 and a lot of these still have a place, but AWS have made the job easier. In fact due to the contributions of partners (including Datacom) over the life of the product Control Tower has really become a great foundation for a landing zone (I know it is listed as a landing zone in itself but I strongly believe that when combined with solutions such as ours it is so much stronger).
What has come out strongly in the last few years (and AWS have led the way I think on some of that) is centralisation. And we along with many others have adopted what is a Hub and Spoke model of Management.
In our Previous Landing zone architecture interaction between VPCs was already hub and spoke using VPC peering but VPC Peering was not a transitive networking construct so it alone couldnt provide a centralised Egress point (there were creative ways around that but when you want to manage at Scale you dont want to have to MacGyver together a solution.
The first solution to that issue came out a couple of years ago and was called AWS Transit Gateway. It was a fully managed network routing solution that removed a lot of complexity that came with VPC Peering (Partial mesh, Full Mesh, etc.) and the limits on peers a VPC could have caused problems with larger organisations.
Recommended by LinkedIn
In addition to that we can take the centralised networking and reduce the number of NAT Gateways (Cost saving for sure) and move egress or ingress traffic properly behind firewall devices. AWS even brought out their own native offering which I think is quite a step forward on the stateless protection of NACLs alone.
Additionally there is identity and AWS solved some of the challenges in our previous federated architecture with AWS Single Sign-On which is a fully centralised mechanism for authentication that provides access to AWS Accounts and Applications from a centralised login. It is a much stronger offering and combined with code based automation that we have developed can federate with several common mechanisms such as Azure Active Directory automatically when new AWS accounts are provisioned
And then there is AWS Account Vending itself. I like the description because it does make accounts easy to order off a menu and thats how it should be. In the old days when a new account was created (either through AWS Organisations or separately) it had a basic blue print which created a default VPC in all regions and that wasnt a very secure VPC structure (you needed to update it to have a proper private subnet approach). One of the cool things about AWS Control Tower is you can prevent the creation of VPCs in regions you dont want them to be. In our case we actually dont use the mechanism at all to create a VPC and instead use our own pipeline automation to vend a new VPC only when requested through AWS Service Catalog. The Account vending is great because you can use it to ensure that a new account is built with the approved Governance
And finally one of my absolute FAVOURITE AWS innovations in network Architecture.... VPC Subnet Shares. When AWS Started to see that organisations were good for more than just billing they put a structure in place that allowed the use of Organisational features across accounts. This included AWS Resource Access Manager which allows for multiple resources to be shared from a master location and consumed by spoke accounts. Subnet Sharing means that you no longer have to actually provision a VPC with its own architecture and can instead have a single VPC that can be shared across multiple similar accounts. I would suggest Production accounts for different apps may share a Production VPC with controls in place (and similar for non production).
This brings AWS a bit more in line with how Private Clouds might operate in that they have a single Network architecture centrally managed and accessible across environments.
AWS Backup is their answer to a multi account backup approach (which is something AWS never used to provide and relied on Partners and Software vendors to do so. With integration with AWS Organisations it can manage backup of EC2 Instances and quite a few other services across account and with integration with CloudWatch
AWS Security Hub and Guard Duty now integrate with automation across the organisation so that if requested any account that gets vended will be enrolled in AWS Security Hub (for Compliance benchmark checks) and Guard Duty for Network monitoring. Previously we had to use some bespoke lambda magic to make that happen but now it is fully integrated with AWS Organisations.
Whilst these are great new things what I think is still key here is that:
Whether an organisation is brand new to the cloud or been in AWS for years it is important that we keep moving forward and I think there is a lot to like about the direction of centralised Cloud Governance