Well Architected Data Pipelines
The modern data stack is in huge flux and is evolving due to change in business requirements, technical complexities as data volume is exploding exponentially from different types of data sources. This introduces the following challenges to data engineers
· Data Quality issues are not detected
· Pipeline failures dues to schema changes in data sources
· Sub-optimal performance in data processing
· Lack of visibility of pipeline health
· Underutilization of resources leading to cost overruns
The above challenges must be addressed upfront in Architecture/Design phase else it can bring in huge amount of technical debt that can derail the data product strategy. Big data architects/consultants should focus on 5 pillars of Well Architected Framework as shown below while Architecting and designing the data pipelines.
Recommended by LinkedIn
Relevance of each pillar
Considerations
There are several considerations, best practices that the Big data architect needs to make in the design and ensure that they are getting implemented. Since documenting the considerations/design principles for all five pillars would make this article verbose, I have compiled the Well Architected Data pipeline as a Mind map as shown below. Mind map is an effective mechanism for creative problem solving and improves fact gathering technique based on collaborative brainstorming sessions while solving a problem.
Conclusion
This article provides an overview of the 5 Well Architected pillars in Data Engineering and the design considerations in a simple mind map. This can be used as a starting point to Architect, design implement and review data pipeline for the Modern data stack. Hope this helps someone who is looking out for a quick reference in designing Secure, Scalable, Reliable and Cost effective data pipelines.
Great one Ajeeth Kumar A !!!