Ecommerce Data Warehouse on Amazon Redshift Serverless: A Production-Grade Star Schema with Historical Accuracy
Terraform, AWS Glue, dbt Core, and a Medallion Architecture for scalable ecommerce analytics
TL;DR
I designed and implemented a production-ready Ecommerce Data Warehouse on Amazon Redshift Serverless using:
The data model is a star schema with:
This prevents double-counting, ensures historical accuracy, and scales with ecommerce growth.
The Problem: Ecommerce Data Breaks Easily
Revenue numbers drift. Customer metrics change retroactively. Product prices update without history.
The moment you lack historical tracking, dashboards stop reflecting reality.
The solution is not “more queries.” The solution is architecture.
Architecture Overview
This implementation follows a layered, production-ready design built for reliability, scalability, and cost control.
Storage Layer – Amazon S3
Raw ecommerce events land in S3.
S3 acts as the immutable source of truth before transformation.
ETL Layer – AWS Glue
AWS Glue performs serverless data processing:
No cluster management. Compute runs only when needed.
Warehouse Layer – Amazon Redshift Serverless
Redshift Serverless provides:
This aligns infrastructure cost directly with analytical demand.
Transformation Layer – dbt Core
All warehouse modeling is managed in dbt Core.
Key practices implemented:
Transformations are version-controlled and reproducible.
Orchestration – AWS Step Functions
Step Functions coordinate:
Workflows are explicit, observable, and fault-tolerant.
Monitoring – CloudWatch
CloudWatch provides:
Recommended by LinkedIn
Failures are visible. Latency is measurable. Cost spikes are traceable.
Data Model: Star Schema Optimized for Analytics
The warehouse uses a dimensional model specifically optimized for analytical queries and BI performance.
Fact Table
fact_order_items Grain: one row per order item
Why item-level grain?
All metrics roll up cleanly from this base grain.
Dimension Tables
dim_customers – SCD Type 2 Tracks historical changes such as customer attributes. Enables accurate point-in-time reporting and customer lifetime value analysis.
dim_products – SCD Type 2 Captures price changes and product attribute updates. Ensures margin and revenue are calculated using the correct historical context.
dim_date Standard calendar dimension supporting time-series analysis.
dim_payment_methods Supports revenue breakdown by payment channel.
dim_shipment_status Enables operational analysis and delivery performance tracking.
Why SCD Type 2 Matters
Without SCD Type 2:
With SCD Type 2:
This is critical for executive-level reporting and forecasting.
Business Value
For C-Level
For Technical Teams
This is not a demo project. It reflects production architecture decisions.
Cost and Performance Strategy
Several deliberate decisions reduce cost and improve performance:
Compute is applied where it produces business value.
What This Demonstrates as a Data Engineer
End-to-end ownership:
This is the difference between writing queries and designing systems.
I design, therefore I exist.
https://github.com/tuni56/ecommerce-data-warehouse-redshift