Running one pod per node
This is part of the series "Running Kubernetes at scale in AWS EKS". Links to all related articles are published there.
We have a very specialized ML application which was tuned for a specific EC2 instance type and the CPU and memory configuration matched very well with our requirements. So we decided to go with one pod per node. You may wonder "What can possibly be wrong with running one pod per node?"
A lot, if you combine it with where your docker registry is placed and how big are your images. When a new container (I use container/pod interchangeably as we also run one container per pod) is spun up on a node the image is pulled from the registry. A second container on the same node will not need to pull the image again. However if you run single container/pod per instance every time auto scale out happens the image is pulled from the registry. So the " imagePullPolicy=ifNotPresent" directive doesn't apply here as on a newly auto scaled node there is no image so has to be pulled.
Compound that with the docker registry sitting in one region of AWS and your cluster in another region. Our image size was upwards of 3.5G (#nvidia GPU container) and daily average of autoscaling servers being 1000 we were looking at 3.5TB daily and close to 100 TB a month of data transfer cost just to transfer docker images. Even the ECR registry comes out to be too expensive for that kind of network out.
We also played with the "--max-concurrent-downloads" in dockerd and by raising the number to 20 got additional 25% better download. But after 20 concurrent it showed a diminishing return as the number of layers on the docker file we had was 20.
In summary what can go wrong
- Higher image transfer cost
- More time taken to start pods
- Long pod start time leads to EKS/Cluster scaler starting more nodes than really required
Solution
At the time of writing we were in a very tight crunch to come up with a solution to move quickly with roll out. We didn't want to go back and spend time on reducing the size of the image which we could have or localizing our docker registry in each region but we went with a very brute force solution:
- Build the docker image as part of the AMI for the Cluster Agent/ASG to scale
We reduced our POD start up time that is time when cluster agent requested an instance in response to a scale out event to when readiness check passed from 10-12 min to 5 min. Part of the 5 in time is attaching GPU to the instance. On an average we are seeing 2.5 min for AWS to attach a GPU instance
This is part of the series "Running Kubernetes at scale in AWS EKS". Links to all related articles are published there
Excellent and very informative.
Hello, here is a high quality tool to check your kubernetes infra for vulnerabilities and protect it from known and unknown attacks - https://github.com/deepfence/ThreatMapper. The added advantage is that the tool is free to use, and is constantly updated !!
Nice trick
Nice!