Docker VS AWS Lambda for Microservices

Docker VS AWS Lambda for Microservices

Over the holidays, I had a chance to really dive deep into both AWS Lambda and Docker. I wanted to see for myself, which platform will be best for creating microservices. 

Test Setup

I was hacking around with Zipkin - an opensource distributed tracing system from Twitter. Thanks to Adrian Cole, Zipkin ran nicely in docker. I wanted to see if I can get a portion of Zipkin (specifically tracegen) to run on AWS Lambda. Zipkin is written in Scala, which runs on Java.

Gotcha's AWS Lambda:

No "System Monitoring" - Unlike docker, AWS lambda very little "system-level" information about your running request - only the request execution time and the memory usage. However, if your request time is long and the memory usage is below the "limit", there is no way to know if there was a network bottleneck or a CPU bottle neck, etc.

Library Cache warming - The first API request is always going to take a long time because AWS lambda has to load the underlying code libraries into memory. This is actually a problem for AWS Lambda because the vast number of servers that your request could potentially run on. Informal testing of Lambda showed that 1 out of 5 requests took a long time (I was sending a request every 2 seconds). Therefore for very low traffic scenarios, AWS Lambda could have lots of variable performance. 

Forget REST, think RPC - AWS Lambda allows you to only execute one function - handleRequest(Object input, Context context). The input is limited to a JSON file. There is no URL path that is passed along, so in a real world application, you need a controller infront for you AWS Lambda function to translate REST to RPC. This could be an AWS Gateway, or something else.

50 MB limit per function - Although 50 MB sounds like a lot, this needs to include all the libraries that your code depends on. Looking at lots of the real-world code that I write (Java), the number of libraries that I pull to write a function easily exceeds 50 MB. This means that you need to really careful what code you use.

Poor Ass Tooling (Java) - Because Lambda doesn't support the concept of WAR files, etc - there isn't any intuitive way to "package" your function and upload it into AWS. There is an eclipse plugin, that automatically allows you to upload a Java project into AWS lambda, but the plugin doesn't automatically upload the JAR dependencies into AWS Lambda. The eclipse plugin is only good for stupid helloworld examples of AWS Lambda. The Maven and Gradle support for AWS Lambda is "beta" at best.

 

Docker VS AWS Lambda

OK. So you told me the gotcha's of AWS Lambda, so why use it over Docker ?

Its easy to save money with AWS Lambda  -  Contrary to popular belief, its really difficult save money via Docker because you still need to scale your docker pods. There are some capabilities in docker to do so, but its not out-of-box and it doesn't seem trivial. There is also 3 different implementation, Kubernetes, Docker Swarm, and Apache Mesoes.  AWS Lambda, like Elastic beanstalk, kinda just takes care of it for you. It might not be perfect, but it should work 99% of the time.

Monitoring built in - Log monitoring and application monitoring via cloudwatch is automatically built in. No need to add other packages or worry about instrumentation.

Ultra-Simple (AWS) event processing system - If you need to process an event generated by an AWS service (S3 / Kinesis / etc), AWS Lambda is the place to go. Its simplicity is its downside too, because realistically it can only handle events generated from AWS services. Although AWS Gateway is positioned as a system that allows AWS Lambda to process any API request, reality is AWS Gateway is just a system that converts external events into "AWS Events". It is really complicated too - almost too complicated.

Recommended way to use AWS Lambda

There are really 2 scenarios for using AWS Lambda :

  1. Processing event generated from AWS Services (S3 / Cloudwatch  / etc)
  2. Using AWS as a general compute engine

For the first case, using AWS lambda is a no brainer. But for the second use case, I only recommend it if you can follow these guidelines:

1. Forget REST, think RPC - Reality is AWS Lambda's API sucks at REST, which is why AWS is pushing developers to use AWS Gateway infront of AWS Lambda. If there is functionality that you want to implement in your code, just call Lambda directly without using AWS Gateway - and treat the AWS Lambda function as a simple RPC call.

2. Use AWS Lambda for your "internal services" rather than "shared service" - AWS Lambda is changing very very rapidly, so I don't think it is prime-time to allow for them to be exposed as a shared service directly (or even in front of a gateway).

3. Use it for highly parallel & variable call scenarios - AWS Lambda EXCELS at this. For example, order processing for a cart with multiple items - it might be necessary process each cart item separately. In an e-Commerce scenario where order traffic and spike up and down, and the # of items in a cart can also spike up and down, AWS lambda will excel at handling this type of load because it has autoscaling built in.

4. Use AWS Lambda for new brand new services only - Forget trying to refactor your existing app into smaller lambda functions. The combination of limits on how much code you and execute and the totally new paradigm means that you will likely spend more time refactoring that writing the code from the ground up. If you really need to break apart a monolithic system, think about refactoring your existing code into smaller services, and put it into docker instead.

5. Watch out for frameworks - Frameworks like Serverless will make building on top of AWS Lambda much easier. So keep your eye's peeled.

 

So did it work ?

Yeap, I managed to get part of Zipkin running on AWS Lambda. The first API call took around 17 seconds, but the subsequent ones took about 1.7 second. Approximately the same performance as running Zipkin's tracegen on my laptop. I might be also one of the "few people" who managed to get Scala to run on AWS Lambda ;-)

 

So who will win for Microservices - Docker or AWS Lambda ?

AWS Lambda will win - sort of.....  From a programming model and a cost model, AWS Lambda is the future - despite so of the tooling limitations. Docker in my opinion is an evolutionary step of "virtualization" that we've been seeing for the last 10 years. AWS Lambda is a step-function. In fact, I personally think it is innovations like Amazon Elastic Beanstalk and CloudFormation that has pushed the demand solutions like Docker.  In the near future, I predict that opensource will catch up and provide an AWS Lambda experience on top of Docker containers. Iron.io is opensource and appears to be going down this path.

I guess this is dated.. AWS has matured since 2016 in this space in using Lambda efficiently No "System Monitoring- Use X-Ray Library Cache warming - Use PreWarm Lambda function so you don’t impact user experience Forget REST, think RPC- Rightly said, any consumer interaction must go through a Gateway to do some basics (custom authorizer etc, Token validation) don’t think anything wrong in imposing design pattern like this. 50 MB limit per function Poor Ass Tooling (Java) - Split your lambda’s. Basics of modular programming. Deploying a WAR shouldn’t even be discussion point. You have better build and deploy tools through CI/CD, front end builds are separate from your backend code. Again, try not to retrofit legacy into new generation. An architectural thought must be given prior to attempting such. Containers have their own benefits. But that’s a separate discussion!

Like
Reply

curious to see what an update this article will yield since it's whiskers shy of 2 years old.

Like
Reply

2 years old article, knowing the techno shorts livecycle, an update would be interesting

To view or add a comment, sign in

More articles by Alan Ho

Others also viewed

Explore content categories