Ecommerce Data Warehouse on Amazon Redshift Serverless: A Production-Grade Star Schema with Historical Accuracy

Rocío Baigorria

Published Feb 12, 2026

+ Follow

Terraform, AWS Glue, dbt Core, and a Medallion Architecture for scalable ecommerce analytics

TL;DR

I designed and implemented a production-ready Ecommerce Data Warehouse on Amazon Redshift Serverless using:

Amazon S3 as raw/staging data lake
AWS Glue for serverless ETL
Redshift Serverless as analytics engine
dbt Core for transformation and testing
Terraform for Infrastructure as Code
AWS Step Functions for orchestration
CloudWatch for monitoring

The data model is a star schema with:

fact_order_items at item-level grain
SCD Type 2 dimensions for customers and products
Full support for point-in-time reporting

This prevents double-counting, ensures historical accuracy, and scales with ecommerce growth.

The Problem: Ecommerce Data Breaks Easily

Revenue numbers drift. Customer metrics change retroactively. Product prices update without history.

The moment you lack historical tracking, dashboards stop reflecting reality.

The solution is not “more queries.” The solution is architecture.

Architecture Overview

This implementation follows a layered, production-ready design built for reliability, scalability, and cost control.

Storage Layer – Amazon S3

Raw ecommerce events land in S3.

Durable storage
Cheap at scale
Clear separation between raw and processed data

S3 acts as the immutable source of truth before transformation.

ETL Layer – AWS Glue

AWS Glue performs serverless data processing:

Schema normalization
Data cleaning
Format standardization
Loading into staging tables

No cluster management. Compute runs only when needed.

Warehouse Layer – Amazon Redshift Serverless

Redshift Serverless provides:

Auto-scaling compute
Pay-per-query pricing
Zero cluster maintenance

This aligns infrastructure cost directly with analytical demand.

Transformation Layer – dbt Core

All warehouse modeling is managed in dbt Core.

Key practices implemented:

Incremental models
Data quality tests
Source freshness checks
Clear model lineage
Modular SQL design

Transformations are version-controlled and reproducible.

Orchestration – AWS Step Functions

Step Functions coordinate:

Raw ingestion
Glue processing
dbt transformations
Validation

Workflows are explicit, observable, and fault-tolerant.

Monitoring – CloudWatch

CloudWatch provides:

Recommended by LinkedIn

From Data Warehousing Woes to Cloud-Powered Insights:…

Saad Mujeeb 🇸🇦 1 year ago

How We Build Data Pipelines on AWS

Navod Peiris 1 year ago

Understanding the Data Analytics Lifecycle on Google…

Gijo Kochuparambil John 2 years ago

Logging
Alerts
Execution monitoring

Failures are visible. Latency is measurable. Cost spikes are traceable.

Data Model: Star Schema Optimized for Analytics

The warehouse uses a dimensional model specifically optimized for analytical queries and BI performance.

Fact Table

fact_order_items Grain: one row per order item

Why item-level grain?

Prevents revenue double-counting
Enables product-level profitability
Supports basket analysis
Preserves analytical flexibility

All metrics roll up cleanly from this base grain.

Dimension Tables

dim_customers – SCD Type 2 Tracks historical changes such as customer attributes. Enables accurate point-in-time reporting and customer lifetime value analysis.

dim_products – SCD Type 2 Captures price changes and product attribute updates. Ensures margin and revenue are calculated using the correct historical context.

dim_date Standard calendar dimension supporting time-series analysis.

dim_payment_methods Supports revenue breakdown by payment channel.

dim_shipment_status Enables operational analysis and delivery performance tracking.

Why SCD Type 2 Matters

Without SCD Type 2:

Historical reports shift retroactively
Customer segmentation becomes unreliable
Margin analysis loses accuracy

With SCD Type 2:

You can reconstruct the business at any past point in time
Revenue reflects the actual product price at purchase
Customer attributes remain historically consistent

This is critical for executive-level reporting and forecasting.

Business Value

For C-Level

Single source of truth for revenue and customer metrics
Accurate historical tracking
Trend analysis and forecasting capability
Serverless cost model aligned with usage
Reduced operational overhead

For Technical Teams

Infrastructure as Code with Terraform
Serverless ETL via AWS Glue
Redshift Serverless optimization
dbt-based analytics engineering
Incremental processing patterns
Star schema optimized for BI tools
Full orchestration and monitoring stack

This is not a demo project. It reflects production architecture decisions.

Cost and Performance Strategy

Several deliberate decisions reduce cost and improve performance:

Incremental dbt models to avoid full recomputation
Item-level fact table to eliminate metric distortion
Serverless compute to avoid idle cluster cost
Separation of raw, staging, and marts
Query-optimized star schema design

Compute is applied where it produces business value.

What This Demonstrates as a Data Engineer

End-to-end ownership:

Designing storage and ingestion
Building ETL pipelines
Modeling dimensional schemas
Implementing SCD Type 2
Managing infrastructure with Terraform
Orchestrating workflows
Monitoring production systems

This is the difference between writing queries and designing systems.

I design, therefore I exist.

Data Science: A Game Changer

872 followers

+ Subscribe

Rocío Baigorria 2mo

https://github.com/tuni56/ecommerce-data-warehouse-redshift

To view or add a comment, sign in

Ecommerce Data Warehouse on Amazon Redshift Serverless: A Production-Grade Star Schema with Historical Accuracy

Rocío Baigorria

Terraform, AWS Glue, dbt Core, and a Medallion Architecture for scalable ecommerce analytics

TL;DR

The Problem: Ecommerce Data Breaks Easily

Architecture Overview

Storage Layer – Amazon S3

ETL Layer – AWS Glue

Warehouse Layer – Amazon Redshift Serverless

Transformation Layer – dbt Core

Orchestration – AWS Step Functions

Monitoring – CloudWatch

Recommended by LinkedIn

Data Model: Star Schema Optimized for Analytics

Fact Table

Dimension Tables

Why SCD Type 2 Matters

Business Value

For C-Level

For Technical Teams

Cost and Performance Strategy

What This Demonstrates as a Data Engineer

Data Science: A Game Changer

872 followers

More articles by Rocío Baigorria

Others also viewed

Real time Data Analytics solution with Spanner Change Streams

Maximizing performance with AWS Glue partitions for parallelized reads and writes

The Truth About Delta Sharing (DBx version with SAP Business Data Cloud)

A RevOps Guide to Data Warehouses

Microsoft Fabric Data Warehouse: The Future of Cloud Analytics

Amazon Redshift vs Amazon Athena: Choosing the Right Analytics Engine in Modern Data Architecture

Comparative Analysis of SAP Datasphere, Azure DWH (Synapse Analytics), Microsoft Data Fabric AWS Redshift, Google BigQuery, and Snowflake

How I Built a 100K Records/Day E-Commerce Data Pipeline on Azure (With Architecture Breakdown)

Full 360 takes big data analysis cloud services to new business heights

Explore content categories

Terraform, AWS Glue, dbt Core, and a Medallion Architecture for scalable ecommerce analytics

TL;DR

The Problem: Ecommerce Data Breaks Easily

Architecture Overview

Storage Layer – Amazon S3

ETL Layer – AWS Glue

Warehouse Layer – Amazon Redshift Serverless

Transformation Layer – dbt Core

Orchestration – AWS Step Functions

Monitoring – CloudWatch

Recommended by LinkedIn

Data Model: Star Schema Optimized for Analytics

Fact Table

Dimension Tables

Why SCD Type 2 Matters

Business Value

For C-Level

For Technical Teams

Cost and Performance Strategy

What This Demonstrates as a Data Engineer

Data Science: A Game Changer

872 followers

More articles by Rocío Baigorria

Building a Production-Ready Serverless Data Lake: From Raw CSVs to Actionable Insights

Stop Babysitting Clusters. Start Designing Systems That Scale Themselves.

The Silent Cost Leaks in AWS Glue Pipelines (and How to Design Around Them)

Metrics Are Talking—Is Your Auto Scaling Listening?

Tendencias Data 2026 en PyMEs Latam

Why data pipelines fail before the code

Why Most AWS Data Lakes Fail (and How Kiro Helped Me Avoid It)

AWS re:Invent Edition

AWS Re:Invent and the New Data Engineering Baseline: Lakehouses, Low Latency, and AI-Native Architectures

The Definitive Guide to Modern Data Architectures for Data Engineers

Others also viewed

Real time Data Analytics solution with Spanner Change Streams

Maximizing performance with AWS Glue partitions for parallelized reads and writes

The Truth About Delta Sharing (DBx version with SAP Business Data Cloud)

A RevOps Guide to Data Warehouses

Microsoft Fabric Data Warehouse: The Future of Cloud Analytics

Amazon Redshift vs Amazon Athena: Choosing the Right Analytics Engine in Modern Data Architecture

Comparative Analysis of SAP Datasphere, Azure DWH (Synapse Analytics), Microsoft Data Fabric AWS Redshift, Google BigQuery, and Snowflake

How I Built a 100K Records/Day E-Commerce Data Pipeline on Azure (With Architecture Breakdown)

Full 360 takes big data analysis cloud services to new business heights

Explore content categories