Design Patterns for Next Generation Data Platform on Cloud
This article is focused on the next gen data platform on cloud and the design patterns that should be taken into account while building the same. Enterprise Architects, CIOs and CTOs can refer the article if they are planning to build their enterprise data platform.
Data is the key driver for innovation and market competitiveness in today’s world. Its boundary is getting expanded exponentially every day. Dealing with the large volume and diverse data using traditional method is becoming challenging. It is posing threat to the growth of the enterprise.
Adoption of on premise Big Data has helped to reduce this threat against high platform cost and not yet qualified as an option for small and medium enterprise. Recent cloud offerings, IaaS to PaaS, the data handling gets affordable and adoptable making the transformation obvious for enterprises. Modern Data platform not only keep the enterprise up to the speed on data capturing and processing but also opens up new avenues for business process innovations using AI and ML unlocking the digital transformation.
Before we jump to the transformation, let’s look back for a moment to recall how the process around data evolved. Due to the limitations of the technology and platform, ETL for batch and data warehouse for storage were adopted across industry. Real time processing got popular with web-service and SOA later but could not break that traditional approach. Now with new technology offerings, processing of data stream becomes handy with acceptable latency and quality. As a result, Lambda architecture that supports both stream (real time) and batch processing, is getting huge traction. Moving from batch to stream is the key in today’s business to get closely connected with either customer or devices.
Recently interest of building an enterprise data platform as the base of the digital transformation has been observed as key trend for sectors with large customer base or IoT devices like Retail, Airlines, Financial institute, Healthcare, Manufacturing etc. Many of these organizations have started or about to start their journey. However, selection of right design patterns is crucial for the realization of the investment. The primary objective of this document is to share 5 design patterns that should be considered while developing the architecture of a data platform. A reference architecture on Azure considering the following design patterns is also included.
1. Ingestion Layer Modernization: Individual micro-service should be created per ingestion channel replacing legacy batch feed. It will bring all benefits of micro-service like scaling, implementation of changes without impacting others. Also filtering of the data can be ignored at this stage since effective data processing engines can be used in data processing layer in next step. Various plug and play options are available today for push or pull the data from sources using both stream (real time) and batch mode.
2. Bring the data AS-IS: Storage in cloud is not only very cheap today but got maturity in read/write performance and data security at rest. Recent offerings like Data Lake, enables the enterprise to capture raw data with minimal filtering/validation at ingestion layer. Having the raw data helps the organization process the data as per individual group requirement. Capturing raw data for both stream and batch is must.
3. Data Processing Layer Modernization: Stream analysis became convenient with smart tools to process large raw data. Real time raw data should be processed for any immediate actions and also stored in parallel to raw data store for reconciliation. Spark can be leveraged to transform the Raw data to Harmonized data within short period. Harmonized data represent more meaning full representation of the data and can be used for AI and ML.
4. Data Preparation for Downstream: Materialized view pattern offers painless query for the downstream applications. It involves creation of the focused data sets for corresponding downstream applications from Harmonized data. Again micro-service based design for building the targeted data sets gives the freedom resulting faster time to market, flexible, etc.
5. Event Driven Processing: Numerous cloud services on event processing including serverless offerings enable us to respond to any data change event with low latency and near real time. Raw event data is processed for identifying any action to be taken using either API call or publishing notification for subscribers.
Let us go through a simple business transformation case before logging out. A company started its Loyalty program 15 years back. The earned loyalty points used to get processed at the end of the day using a batch platform. Customers had to wait till next day to redeem the earned loyalty points in their old system. This worked well when it was introduced but became unacceptable today. Updating the customer account in real time is must when a purchase has been made so customer can use the points for the next purchase immediately. The moral of the story is companies need to value their customers to remain competitive and a platform with such capability can enable such transformation.
Good read!
Great article Sam.