Data-warehousing using MS BI Stack For Tableau Sample_Super_Store data set
Sample_Super_Store. Excel to warehouse system architecture

Data-warehousing using MS BI Stack For Tableau Sample_Super_Store data set

Creating a warehouse from scratch can be challenging if we are unaware of the system architecture of source system.

I have tried to simplify the process with the following use case. I had a well known sample_superstore data-set in excel format and I thought it could be a good example to start with building one DW on my local system.

  1. Identifying the source system is important, here in we have only one excel source. In development environment you will find variety of sources. e.g. Excel,DBMS,SharePoint,Document storage,Flat files,Streaming data from IOT devices, etc.

Source to target mapping

2.Post identification we can define our facts & dimensions to normalize data. A source to target mapping ,which documents the data types of columns & there respective journey from our data pipeline.

The mapping shows us our ETL logic & proves truly helpful document when you have numerous sources & columns in your data pipeline. It helps in debugging the pipeline while failures.

3. To create data pipeline ETL tools are requires. one such tool SSIS is mentioned in screenshots below

SSIS package
Staging tables

Here SSIS is used to load data into the staging tables of the database system(MS SQL). Staging database is helpful to archive & we can have benefit of SQL query language to process our data more faster. Staging can be done on the same server as you warehouse if required.

4. Once the data is staged we can use stored procedures to load our dimension tables. Since in the excel records are repeated for each transactions and we need to load unique records in dimension tables, SP can be created & called inside SSIS "Execute SQL Script" step.

No alt text provided for this image
Fact Data flow

5. The final step is to load the fact table. here in to insert keys from dimension tables. "Lookup" activity can be used. we need to map each dimension key with the fact table column while populating it.




6. There are several other ways to manage data. here we have applied truncate load since the data is not going to update as source is static excel file. All the tables are truncated at the very start of the pipeline. we do need to create incremental load pipelines & Slowly changing dimensions needs to be handled for the changing dimension tables for maintaining history.

After successful run of pipeline, this is how the data looks in the SQL warehouse database.

Final output

The scheduling can be done of the SSIS package in the SSMS client or can be called using Azure Data Factory in some cases.

Thanks for your time to read. Do post in comment sections your feedback or questions if any. Have a great time !


To view or add a comment, sign in

More articles by Pratik S.

  • Planning Migration in Power BI

    Migration in general regardless of technology and type of migration requires lot of planning, strategy and decision…

    1 Comment
  • ETL vs ELT Data Processing Architecture

    Both in traditional and modern data processing and warehousing platforms, raw data from multiple sources such as Excel…

  • Cloud Migration Strategies

    Cloud migration The term "Cloud migration" refers to move, transfer or shift your IT assets, infrastructure, loads…

  • Swiggy spend analytics and reporting

    Recently I got an e-mail from swiggy "Happy Swiggyversary" and it has few details such as the first restaurant I placed…

    2 Comments

Others also viewed

Explore content categories