The Hidden Architecture Behind Cube Multi-Datasource Setups (And the S3 Credential Trap)

Ouma Winnie

Published Feb 24, 2026

When working with Cube, configuring multiple databases looks straightforward.

Until it isn’t.

Recently, while integrating DuckDB with Amazon S3 inside a multi-datasource Cube setup, I ran into a subtle configuration issue that perfectly illustrates how Cube handles datasource modes internally.

What looked like an S3 permissions problem turned out to be something much deeper: Cube has two mutually exclusive datasource configuration modes.

And understanding that distinction changes everything.

🧠 Cube Has Two Configuration Modes

1️⃣ Single (Default) Datasource Mode

If you configure Cube like this:

CUBEJS_DB_TYPE=duckdb

You are in single-datasource mode.

In this setup:

Cube assumes one database.
All cubes use it.
Driver variables are read directly.
Environment variables map cleanly to the driver.
Documentation examples work exactly as written.

Architecturally, it looks like this:

Cube
 └── Database (only one)

Simple. Clean. Predictable.

But there’s a constraint:

You cannot add another database unless you switch modes.

2️⃣ Multi-Datasource Mode

The moment you define:

CUBEJS_DATASOURCES=default,Redshift,DuckDB

Cube switches into multi-datasource mode.

Now every database must be declared explicitly:

CUBEJS_DS_DUCKDB_DB_TYPE=duckdb
CUBEJS_DS_REDSHIFT_DB_TYPE=redshift

Architecturally:

Cube
 ├── default
 ├── Redshift
 └── DuckDB

Important implications:

default is no longer implicit — it’s just another named datasource.
All variables must be scoped with CUBEJS_DS_<NAME>_...
CUBEJS_DB_TYPE should NOT be used.
Config mapping behaves differently.

And this is where things can get tricky.

🧩 The S3 Credential Trap

In single mode, DuckDB S3 credentials work like this:

CUBEJS_DB_DUCKDB_S3_ACCESS_KEY_ID=...
CUBEJS_DB_DUCKDB_S3_SECRET_ACCESS_KEY=...

But in multi-datasource mode, they must be scoped:

CUBEJS_DS_DUCKDB_DB_S3_ACCESS_KEY_ID=...
CUBEJS_DS_DUCKDB_DB_S3_SECRET_ACCESS_KEY=...

Miss that subtle difference — or rely on documentation that assumes single mode — and you’ll see S3 failures that look like:

“Access Denied” “Missing credentials” “Unable to load httpfs extension”

But the real issue isn’t S3.

It’s configuration scoping.

🔍 The Breakthrough Insight

What finally resolved the issue?

Setting:

AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

Instead of relying solely on Cube-scoped variables.

Why did this work?

Because DuckDB ultimately follows the standard AWS credential resolution chain. By setting global AWS environment variables, the driver bypassed Cube’s datasource mapping entirely.

(Ps: Remember to include an environments section in your docker compose to get these env variable

)

That confirmed the real lesson:

The problem wasn’t S3 permissions. It was how Cube injects driver configuration in multi-datasource mode.

⚠️ The Rule That Isn’t Obvious in the Docs

These two configurations are mutually exclusive:

CUBEJS_DB_TYPE=...

and

CUBEJS_DATASOURCES=...

You must choose one mode.

CUBEJS_DB_TYPE → Single-datasource mode
CUBEJS_DATASOURCES → Multi-datasource mode

They do not mix.

🏗️ Architectural Takeaways

If you’re running:

MySQL for application data
Redshift for warehouse analytics
DuckDB for S3-based computation

Then multi-datasource mode is the correct architectural decision.

But you must:

Scope every datasource variable properly.
Avoid mixing single-mode variables.
Understand how driver-level credential resolution works.
Debug from the architecture layer — not just the error message.

💡 What This Really Taught Me

Most production debugging issues are not about:

SQL
Permissions
IAM
The database

They’re about configuration mode mismatches.

Understanding how your framework internally switches operational modes is often the difference between:

5 minutes of clarity
5 hours of confusion

Final Thought

Modern analytics stacks are increasingly hybrid:

Embedded engines like DuckDB
Cloud storage like Amazon S3
Warehouses like Redshift
Orchestrators and modeling layers

The more composable your architecture becomes, the more important it is to understand how configuration boundaries work.

Because sometimes, the bug isn’t in your query.

It’s in your mode.

If you're working with Cube in a multi-datasource environment, I’d love to compare notes — especially around embedded engines + object storage patterns.

To view or add a comment, sign in

The Hidden Architecture Behind Cube Multi-Datasource Setups (And the S3 Credential Trap)

Ouma Winnie

🧠 Cube Has Two Configuration Modes

1️⃣ Single (Default) Datasource Mode

2️⃣ Multi-Datasource Mode

🧩 The S3 Credential Trap

🔍 The Breakthrough Insight

⚠️ The Rule That Isn’t Obvious in the Docs

🏗️ Architectural Takeaways

💡 What This Really Taught Me

Final Thought

More articles by Ouma Winnie

Explore content categories

🧠 Cube Has Two Configuration Modes

1️⃣ Single (Default) Datasource Mode

2️⃣ Multi-Datasource Mode

🧩 The S3 Credential Trap

🔍 The Breakthrough Insight

⚠️ The Rule That Isn’t Obvious in the Docs

🏗️ Architectural Takeaways

💡 What This Really Taught Me

Final Thought

More articles by Ouma Winnie

Applying Ensemble Learning – Bagging, Boosting, or Stacking [in Finance]

Lost Your n8n Workflows? How I Recovered Mine from a Dockerized PostgreSQL Database (and You Can Too!)

Common Errors When Setting Up Localstack Using Localstack CLI

📊 When “Important” Doesn’t Mean “Causal”: Making Sense of Feature Importance in Engagement Models

Knapsack Problem: Brute force vs Dynamic Programming?

Analysis Using the Master Theorem

The Compilation Process

MEMORY AND FILE SYSTEMS: Navigating file systems; Changing Partitions and naming Drives; Memory handling

Polyamine-associated pathways in cancers and its Potential as Anticancer Drug Target

Windows vs Mac: technical differences, pros & cons, user's needs

Explore content categories