Flexible Data Warehouse Design

Anthony Bulk

Published Aug 18, 2017

Change is inevitable.

Your data warehouse should be able to rapidly adapt to changes.

In this blog post I will outline an architecture that will allow you to rapidly respond to changes in both the source systems and the data warehouse which reports and cubes use for their source of data.

Overview

The diagram below illustrates the architecture of a higly adaptable BI system built using SQL Server 2012 or later. It is an incremental load process which ensures minimal load times and optimal performance.

The flow of data in this system goes from left to right and follows these steps; replicate, process changes, transform changes and lastly process/query.

Replicate

A real-time replication process maintains an exact copy of each source system into a operational data store (ODS). The ODS database has change tracking turned on so that modifications done by the replication process are tracked and can later be picked up by the process that loads staging.

Process Changes

An ETL process, SQL Server Integration Services (SSIS), picks up all of changes that have occurred in the ODS since the last time it successfully ran. This is done leveraging the change data capture transforms in SSIS. The changes are processed into similaraly structured tables in different schemas in the staging database. The only difference between the tables in staging and the ODS is that there are some additional audit fields and a CDC field indicated what kind of change occurred on the record (0 = no change, 1 = delete, 2 = insert, 3 = pre-update, 4= post update).

On top of each table is a view. This view will be the source for downstream SSIS processes which further transform the data prior to loading it into the enterprise data warehouse (EDW). The views act as layer of abstraction protecting the more complex SSIS processes which transform the data and load the EDW. The views allow you to rapidly adapt to source system changes by enabling you to modify a logical layer so downstream processes do not break. Typically modifying a view is easier and faster then modifying a complex ETL process.

Tranform Changes

Next another ETL process will grab all of the changed records from the views in staging and load them into dimension and fact tables in the data warehouse. Because we are using CDC and SSIS we can identify all the records that have changed as well as what types of changes have occured inserts, updates and even deletes.

This is very powerful because it gives you the ability to easily process data in the appropriate manor such as expire records in the data warehouse that have been deleted in the source system or track changes to transaction records that should not change. This would be very difficult using a traditional load control process.

On top of each fact and dimension table in the data warehouse are views that allow you to rapidly adapt to changes required by users of the reports and cubes; for example a business users believes that the code value should be incorporated with description for every record in a dimension table however they are not sure of the format <code> : <description> or (<code>) <description> etc. this logical layer allows you rapidly iterate through until the best option is found and then incorporate that logic into the ETL at a later point in time.

Processs/Query

All of the cubes and reports should be sourced from the views in the datawarehouse. In order to maximize query performance the views should have the schemabinding option enabled so that indexes can be added to the views.

By leveraging two layers of abstraction (views on top of tables) one in staging and one in the data warehouse you will have a robust system that can be easily modified to accommodate changes in the source system as well as in the reporting requirements.

In my next post I will talk about a great way monitor the replication process to ensure changes are happening in your ODS.

Until next time.

Anthony

To view or add a comment, sign in

Flexible Data Warehouse Design

Anthony Bulk

Overview

Replicate

Process Changes

Tranform Changes

Processs/Query

More articles by Anthony Bulk

Others also viewed

The Modern Data Platform – Challenge 2 – Complex Architecture

Best Practices for Effective Data Warehousing and Business Intelligence

15 Key terms in Data Warehousing

A comparison of data warehouse design approaches: Kimball and Inmon

Data warehouse automation explained. Benefits and use cases

Understanding the Significance of Data Warehousing in Business Intelligence

Snowflake and Data Warehousing

A DWH made of bricks: brand-new data modelling and yet another view on automation

Data Warehouse Revolution. When to use Data Vault?

Data Warehouses vs. Data Marts

Explore content categories

Overview

Replicate

Process Changes

Tranform Changes

Processs/Query

More articles by Anthony Bulk

Local RAG with Advanced Reasoning POC

Power BI API Enabled Datasets

Integrate Power BI, Power Apps, Azure Machine Learning using MS Flow - Part 2 Create the Azure Machine Learning Studio Experiment

An intro to plotting with Python in Power BI

Getting started with Power BI and Python

Divide and conquer the Analysis Services Tabular model project

Power BI, R and the Olympics - Part 6 What if scenarios with Predictive modelling in R

Power BI, R and the Olympics - Part 4 What if analysis and GGPLOT

Power BI, R and the Olympics - Part 4 What if analysis (intro)

Power BI, R and the Olympics - Part 3 Manipulate the data using R

Others also viewed

The Modern Data Platform – Challenge 2 – Complex Architecture

Best Practices for Effective Data Warehousing and Business Intelligence

15 Key terms in Data Warehousing

A comparison of data warehouse design approaches: Kimball and Inmon

Data warehouse automation explained. Benefits and use cases

Understanding the Significance of Data Warehousing in Business Intelligence

Snowflake and Data Warehousing

A DWH made of bricks: brand-new data modelling and yet another view on automation

Data Warehouse Revolution. When to use Data Vault?

Data Warehouses vs. Data Marts

Explore content categories