AWS at Scale - Hub model and Control Tower

What do you do when you have a single AWS Account with a small number of workloads? You might just manage it in a simple way with very little structure to how it is managed. Because if it is only 1 Account with 1 VPC and some infrastructure it might not matter as much?

But what is you have 100 or 1000 accounts?

At Scale you need then to have a Landing zone.

A Landing zone is defined as a centralised Governance model that acts as an overlay over a series of AWS Accounts and infrastructure. It puts a Governance stamp over things that traditionally operated separately

Datacom have been in the Landing zone game in AWS for about 5 years (for Context, AWS themselves only released their first landing zone product in 2018 followed by AWS Control Tower in 2019) so we have a strong mindset for management at scale and in our recent forays we have worked closely with AWS in recent years to help shape improvements in landing zones in general.

How did we start thinking in this way?

We started articulating what would make it possible to manage 100 or 1000 AWS accounts with a similar level of Human resources and interaction as it would take to manage 10-20. And ideas formed:

  • Templates (Cloudformation then, AWS CDK Now) for almost every infrastructure element you built
  • A Centralised Governance framework (Federated identity, now made better with AWS Single Sign On)
  • Rules to manage what users could do
  • Standard architecture for Networking (A VPC Structure that was consistent across every account we deployed it to)
  • Security Groups as Code (Enforce Security Groups, NACL permissions, )
  • Cross Account Lambda Functions
  • VPC Peering and Internet Access via NAT Gateways
  • Backup through a 3rd Party product that managed AMI creation and snapshots of VMs and databases.
  • Puppet for compliance

Funnily enough, fast forward to 2021 and a lot of these still have a place, but AWS have made the job easier. In fact due to the contributions of partners (including Datacom) over the life of the product Control Tower has really become a great foundation for a landing zone (I know it is listed as a landing zone in itself but I strongly believe that when combined with solutions such as ours it is so much stronger).

What has come out strongly in the last few years (and AWS have led the way I think on some of that) is centralisation. And we along with many others have adopted what is a Hub and Spoke model of Management.

In our Previous Landing zone architecture interaction between VPCs was already hub and spoke using VPC peering but VPC Peering was not a transitive networking construct so it alone couldnt provide a centralised Egress point (there were creative ways around that but when you want to manage at Scale you dont want to have to MacGyver together a solution.

The first solution to that issue came out a couple of years ago and was called AWS Transit Gateway. It was a fully managed network routing solution that removed a lot of complexity that came with VPC Peering (Partial mesh, Full Mesh, etc.) and the limits on peers a VPC could have caused problems with larger organisations.

In addition to that we can take the centralised networking and reduce the number of NAT Gateways (Cost saving for sure) and move egress or ingress traffic properly behind firewall devices. AWS even brought out their own native offering which I think is quite a step forward on the stateless protection of NACLs alone.

Additionally there is identity and AWS solved some of the challenges in our previous federated architecture with AWS Single Sign-On which is a fully centralised mechanism for authentication that provides access to AWS Accounts and Applications from a centralised login. It is a much stronger offering and combined with code based automation that we have developed can federate with several common mechanisms such as Azure Active Directory automatically when new AWS accounts are provisioned

And then there is AWS Account Vending itself. I like the description because it does make accounts easy to order off a menu and thats how it should be. In the old days when a new account was created (either through AWS Organisations or separately) it had a basic blue print which created a default VPC in all regions and that wasnt a very secure VPC structure (you needed to update it to have a proper private subnet approach). One of the cool things about AWS Control Tower is you can prevent the creation of VPCs in regions you dont want them to be. In our case we actually dont use the mechanism at all to create a VPC and instead use our own pipeline automation to vend a new VPC only when requested through AWS Service Catalog. The Account vending is great because you can use it to ensure that a new account is built with the approved Governance

And finally one of my absolute FAVOURITE AWS innovations in network Architecture.... VPC Subnet Shares. When AWS Started to see that organisations were good for more than just billing they put a structure in place that allowed the use of Organisational features across accounts. This included AWS Resource Access Manager which allows for multiple resources to be shared from a master location and consumed by spoke accounts. Subnet Sharing means that you no longer have to actually provision a VPC with its own architecture and can instead have a single VPC that can be shared across multiple similar accounts. I would suggest Production accounts for different apps may share a Production VPC with controls in place (and similar for non production).

This brings AWS a bit more in line with how Private Clouds might operate in that they have a single Network architecture centrally managed and accessible across environments.

AWS Backup is their answer to a multi account backup approach (which is something AWS never used to provide and relied on Partners and Software vendors to do so. With integration with AWS Organisations it can manage backup of EC2 Instances and quite a few other services across account and with integration with CloudWatch

AWS Security Hub and Guard Duty now integrate with automation across the organisation so that if requested any account that gets vended will be enrolled in AWS Security Hub (for Compliance benchmark checks) and Guard Duty for Network monitoring. Previously we had to use some bespoke lambda magic to make that happen but now it is fully integrated with AWS Organisations.

Whilst these are great new things what I think is still key here is that:

  • Manage as Code if you want to manage at Scale (if you have to touch a console as an engineer you should get something in the devops backlog so you dont have to next time).
  • Centralise Networking Architecture (Economies of scale, easier to secure)
  • Automate Everything (whilst the AWS control Tower may still be maturing in its automation elements there is substantial capability to automate everything from account provisioning to VPC architecture, transit gateway and security.
  • Define a Benchmark for compliance and design accordingly
  • Customers dont want to call IT Teams for things they should be able to do themselves so any landing zone should have a self service element and AWS Service Catalog can be shared across an organisation and deliver a centralised product catalog for each account
  • Centralise Security Group Policies and firewall policies as much as possible and then code it
  • AWS Listen to feature requests so if you are working with a partner like us know that we are often working in the background to improve services and partner feedback often feed in to AWS Features that come out (and they come out in massive quantities every year).

Whether an organisation is brand new to the cloud or been in AWS for years it is important that we keep moving forward and I think there is a lot to like about the direction of centralised Cloud Governance


To view or add a comment, sign in

More articles by Lee Murphy

Others also viewed

Explore content categories