Data-warehousing using MS BI Stack For Tableau Sample_Super_Store data set
Creating a warehouse from scratch can be challenging if we are unaware of the system architecture of source system.
I have tried to simplify the process with the following use case. I had a well known sample_superstore data-set in excel format and I thought it could be a good example to start with building one DW on my local system.
2.Post identification we can define our facts & dimensions to normalize data. A source to target mapping ,which documents the data types of columns & there respective journey from our data pipeline.
The mapping shows us our ETL logic & proves truly helpful document when you have numerous sources & columns in your data pipeline. It helps in debugging the pipeline while failures.
3. To create data pipeline ETL tools are requires. one such tool SSIS is mentioned in screenshots below
Here SSIS is used to load data into the staging tables of the database system(MS SQL). Staging database is helpful to archive & we can have benefit of SQL query language to process our data more faster. Staging can be done on the same server as you warehouse if required.
4. Once the data is staged we can use stored procedures to load our dimension tables. Since in the excel records are repeated for each transactions and we need to load unique records in dimension tables, SP can be created & called inside SSIS "Execute SQL Script" step.
Recommended by LinkedIn
5. The final step is to load the fact table. here in to insert keys from dimension tables. "Lookup" activity can be used. we need to map each dimension key with the fact table column while populating it.
6. There are several other ways to manage data. here we have applied truncate load since the data is not going to update as source is static excel file. All the tables are truncated at the very start of the pipeline. we do need to create incremental load pipelines & Slowly changing dimensions needs to be handled for the changing dimension tables for maintaining history.
After successful run of pipeline, this is how the data looks in the SQL warehouse database.
The scheduling can be done of the SSIS package in the SSMS client or can be called using Azure Data Factory in some cases.
Thanks for your time to read. Do post in comment sections your feedback or questions if any. Have a great time !
Thanks for sharing