Well Architected Data Pipelines

Well Architected Data Pipelines

The modern data stack is in huge flux and is evolving due to change in business requirements, technical complexities as data volume is exploding exponentially from different types of data sources. This introduces the following challenges to data engineers

·        Data Quality issues are not detected

·        Pipeline failures dues to schema changes in data sources

·        Sub-optimal performance in data processing

·        Lack of visibility of pipeline health

·        Underutilization of resources leading to cost overruns

The above challenges must be addressed upfront in Architecture/Design phase else it can bring in huge amount of technical debt that can derail the data product strategy. Big data architects/consultants should focus on 5 pillars of Well Architected Framework as shown below while Architecting and designing the data pipelines.

No alt text provided for this image
Figure - 1

Relevance of each pillar

  1. Security:  Security should be the topmost consideration in a Well Architected pipeline. Security pillar has the design principles to secure the data as it flows through the pipeline.
  2. Performance: Performance pillar deals with optimized utilization of storage and compute to meet the NFRs like latency, throughput, scalability
  3. Reliability: Reliability should deal with ensuring the data pipeline can run consistently across environments, foresee downtime, handle failures and selfheal to meet the operational SLAs and improve availability
  4. Operational Excellence: Operational excellence cares about monitoring and continuous improvement of Data pipelines to deliver business outcomes
  5. Cost Optimization: This pillar ensures that the data pipeline is able to achieve the business outcome at the lowest possible price by using the resources efficiently.

Considerations

There are several considerations, best practices that the Big data architect needs to make in the design and ensure that they are getting implemented. Since documenting the considerations/design principles for all five pillars would make this article verbose, I have compiled the Well Architected Data pipeline as a Mind map as shown below. Mind map is an effective mechanism for creative problem solving and improves fact gathering technique based on collaborative brainstorming sessions while solving a problem.

No alt text provided for this image
Figure-2

Conclusion

This article provides an overview of the 5 Well Architected pillars in Data Engineering and the design considerations in a simple mind map. This can be used as a starting point to Architect, design implement and review data pipeline for the Modern data stack. Hope this helps someone who is looking out for a quick reference in designing Secure, Scalable, Reliable and Cost effective data pipelines.

To view or add a comment, sign in

More articles by Ajeeth Kumar A

Others also viewed

Explore content categories