Key things to consider for a server-less Data driven solution or a Data pipe line

Binaya Kumar Behera

Published Mar 15, 2022

During my discussions with customers on data driven solutions, I often try to squeeze in a serverless component to the solution. Partly because there is always a gap which can be bridged and addressed swiftly with serverless implementation and partly because it is simply awesome when the use case is correct. And with micro services style implementation it is something next level.

Below are my suggestion on few things(in no specific order) to consider when planning for a serverless implementation of any data driven solution,

- Data/Payload size limit at each layer of data pipeline

You must define a payload size constraint while designing a serverless platform, else it is just a matter of time for your data pipeline starts to break/leak.

- Stateful or stateless

Stateful and serverless just don’t gel well. REST/Stateless/event driven is the way to go.

- Code/Executable size

Cloud vendors already have put size constraint on server-less services(such as AWS Lambda, Azure functions etc.), if deployment units are dockers/containers then too lesser the size the better for a reduced warm up time. And it is always a good idea to break it into multiple micro services if the code is huge/not optimized.

- What is the data pipeline length or how many clocked stages your solution has got

This I learnt from experience. In simple terms, if the solution you are planning for serverless has more components throughout the pipeline (typically 3-5 including database layer is optimal), then it is time to re-structure or preferrably a parallel pipe.

Recommended by LinkedIn

Designing Modern Data Platforms with Azure

Rangaraj Balakrishnan 1 year ago

Beyond the Single Technique: The Power of Synergy

Huy Truong 2 years ago

Navigating Delta Technologies in Azure Databricks: An…

Bryan Sanders 2 years ago

- Is it event driven

Your solution must be event/trigger/schedule driven. Serverless is on demand and stateless and hence it must be triggered.

- SQL/NoSQL

Though both work well, typically the solutions I design are largely NoSQL driven if designed from scratch. The advantage of NoSQL is on-demand scalability which goes well with serverless. You can have a hybrid model if your data source/destination is SQL based.

- Data sources and data delivery/extraction, continuous stream/ batch/ poll

Don’t miss out on data sources while designing serverless, they play a crucial role. Also, the data extraction/delivery strategy is equally important.

- Compliance/security or cloud /on prem

Last but not the least, compliance and security. Serverless is tricky when it comes to security.

Don't hesitate to add anything if I missed, in the comments. Thanks for reading.

To view or add a comment, sign in

Key things to consider for a server-less Data driven solution or a Data pipe line

Binaya Kumar Behera

Recommended by LinkedIn

More articles by Binaya Kumar Behera

Others also viewed

Cloud-native batch integration with MarkLogic Data Hub Service

Building a Serverless Data Lake with AWS Lake Formation

Trends That Revolutionized the Role of Data Engineers

ioTips: Best Practices for Amazon Athena

Building a Scalable Data Lake with AWS S3 and Open-Source Technologies for the BFSI Sector

Building a Scalable Data Lake with AWS S3 and Open-Source Technologies for the BFSI Sector

AWS Pipeline: From Data Swamp to Data Lake

Oracle’s Strategic Blueprint for Unified, Multicloud, AI, and Data

Cloud Wars: Choosing Between AWS, Azure & GCP for Data Engineering Projects

What Is an IT Environment in Cloud-Based Big Data Engineering?

Explore content categories

Recommended by LinkedIn

More articles by Binaya Kumar Behera

The 12 factors - Modern App Design

Database design - few tips I find helpful

Manage ETL jobs on AWS without any Coding

I don't attach IAM Role to my ec2 coz i don't need it. Do you?

Others also viewed

Cloud-native batch integration with MarkLogic Data Hub Service

Building a Serverless Data Lake with AWS Lake Formation

Trends That Revolutionized the Role of Data Engineers

ioTips: Best Practices for Amazon Athena

Building a Scalable Data Lake with AWS S3 and Open-Source Technologies for the BFSI Sector

Building a Scalable Data Lake with AWS S3 and Open-Source Technologies for the BFSI Sector

AWS Pipeline: From Data Swamp to Data Lake

Oracle’s Strategic Blueprint for Unified, Multicloud, AI, and Data

Cloud Wars: Choosing Between AWS, Azure & GCP for Data Engineering Projects

What Is an IT Environment in Cloud-Based Big Data Engineering?

Explore content categories