Integration with AWS Serverless Tools
With the industry-wide shift towards cloud, integration capabilities in public clouds are becoming more and more important. Current enterprise IT landscapes can include tens or even hundreds of systems that somehow need to work together with the aim to provide high business value, and this is where integration can really be the differentiator in whether you are able to extract the full value of your IT investment.
Even though every public cloud vendor offers some kind of transformation/service-bus capability, integration capabilities have not been the biggest focus of leading public cloud vendors like AWS, Azure and Google Cloud. Integrators like Mulesoft, Dell Boomi and Oracle Fusion have been leading the integration space till now.
However, with (old-school) enterprises demanding more and more from public clouds, it is just a matter of time before the leading cloud vendors scale up their integration capabilities to also focus on enterprise use cases involving legacy systems, on-premise integrations etc.
For conventional middleware developers/Solution Architects, the move to public clouds can feel a bit daunting since public clouds have more of a hands-on coding approach to integration compared to typical integration tools where drag-and-drops can achieve most of the work if you know what you are doing. In this blog, I would try to give an idea of some serverless integration tools available within AWS and sample use cases for each of them. There can be a whole other discussion on using EC2/Docker vs using lambdas, but we will not go there in this blog.
Integration Tools in AWS:
Lambda:
Lambdas are serverless functions, which can be used to write and run code without worrying about the underlying infrastructure. They can be used to code just about anything, in just about any language. The cost of running lambdas is primarily determined by two factors, how long a lambda takes to run and how much memory does it use. Lambdas are very cheap to run, and are typically used to design small microservices.
Typical problems being faced earlier while using lambdas were cold-starts(since infrastructure to run the lambda is created on-the-fly when a request for the lambda is received, it can lead to delays) and short timeouts of 5 mins (if your lambda runs for more than 5 minutes, it will time out with an error), recent changes in lambdas have made lambdas much easier to use with the cold-starts being improved and the time limit for lambda execution being improved to 15 minutes.
Lambda Sample Use Case:
An API endpoint is exposed for clients to store a few fields in an AWS hosted DB. The transformation from client data format to DB data format is not complex and involves only a few fields, and is not expected to take more than a few seconds. When a request is received by the API Gateway, a lambda is created to transform data and stores it in the DB.
Lambdas the best suited for short lived workloads which perform one single task only, which normally deal with data upto a few MBs. With a million request free per month, it is pretty hard to beat Lambda as the tool of choice for small workloads.
AWS Glue:
Glue is the default AWS offering for ETL. What is not clear from AWS documentation is that Glue is mostly focused on extracting and reading large amounts of data from data sources like S3. If the requirement is to call a couple of APIs, ingest and transform data and send the data to a target, and the data volumes are not high, maybe Glue is not the correct choice.
Glue offers the possibility to create “jobs”, which can be written in Python or Scala and offers a lot of options for reading and analyzing large batches of data. Glue also offers the possibility to stitch together “workflows”, which is a string of jobs intended to achieve a use-case. If your use-case requires large data volumes, probably Glue should be your first consideration.
Glue jobs had a large minimum runtime previously, but it has been reduced significantly recently, thus making glue jobs much more cost effective.
Glue Sample Use Case:
Collecting and analyzing huge amounts of data from one/variety of sources and ETLing them is what Glue is good at. An example can be your enterprise data spread across relational and non-relational data sources and streaming data from Kinesis or Kafka, and the need to transform all this data to be stored in a warehouse which is subsequently used to power BI platforms.
Step Functions:
For a conventional middleware developer, Step Functions is what would feel the most familiar and close to tools like Mulesoft, Dell Boomi or Oracle Fusion. Step functions gives you the capability to stitch together workflows based on lambdas or glue to achieve a complex business use case. It gives you the possibility to distribute your business logic in separate components (as opposed to coding a huge lambda with all business functionality which quickly becomes difficult to maintain.
Step Functions Sample Use Case:
Step functions can be used to create workflows using a variety of components including lambdas, glue and many others. For a complex enterprise use-case, like extracting data from a DB or an API, transforming it, getting a security token from another API, and finally updating data to another API /DB would be a good use case for step functions.
A huge advantage of Step Functions is that they are stateful, they can maintain their state across various steps. You are only charged when the state of a step function changes, and not for the time the step function is running.
*All images from AWS Documentation
Nice one Pulkit, indeed very useful
Must read blog for all the Cloud people/ interested ones.