The Hidden Architecture Behind Cube Multi-Datasource Setups (And the S3 Credential Trap)

The Hidden Architecture Behind Cube Multi-Datasource Setups (And the S3 Credential Trap)

When working with Cube, configuring multiple databases looks straightforward.

Until it isn’t.

Recently, while integrating DuckDB with Amazon S3 inside a multi-datasource Cube setup, I ran into a subtle configuration issue that perfectly illustrates how Cube handles datasource modes internally.

What looked like an S3 permissions problem turned out to be something much deeper: Cube has two mutually exclusive datasource configuration modes.

And understanding that distinction changes everything.


🧠 Cube Has Two Configuration Modes

1️⃣ Single (Default) Datasource Mode

If you configure Cube like this:

CUBEJS_DB_TYPE=duckdb        

You are in single-datasource mode.

In this setup:

  • Cube assumes one database.
  • All cubes use it.
  • Driver variables are read directly.
  • Environment variables map cleanly to the driver.
  • Documentation examples work exactly as written.

Architecturally, it looks like this:

Cube
 └── Database (only one)        

Simple. Clean. Predictable.

But there’s a constraint:

You cannot add another database unless you switch modes.

2️⃣ Multi-Datasource Mode

The moment you define:

CUBEJS_DATASOURCES=default,Redshift,DuckDB        

Cube switches into multi-datasource mode.

Now every database must be declared explicitly:

CUBEJS_DS_DUCKDB_DB_TYPE=duckdb
CUBEJS_DS_REDSHIFT_DB_TYPE=redshift        

Architecturally:

Cube
 ├── default
 ├── Redshift
 └── DuckDB        

Important implications:

  • default is no longer implicit — it’s just another named datasource.
  • All variables must be scoped with CUBEJS_DS_<NAME>_...
  • CUBEJS_DB_TYPE should NOT be used.
  • Config mapping behaves differently.

And this is where things can get tricky.


🧩 The S3 Credential Trap

In single mode, DuckDB S3 credentials work like this:

CUBEJS_DB_DUCKDB_S3_ACCESS_KEY_ID=...
CUBEJS_DB_DUCKDB_S3_SECRET_ACCESS_KEY=...        

But in multi-datasource mode, they must be scoped:

CUBEJS_DS_DUCKDB_DB_S3_ACCESS_KEY_ID=...
CUBEJS_DS_DUCKDB_DB_S3_SECRET_ACCESS_KEY=...        

Miss that subtle difference — or rely on documentation that assumes single mode — and you’ll see S3 failures that look like:

“Access Denied” “Missing credentials” “Unable to load httpfs extension”

But the real issue isn’t S3.

It’s configuration scoping.


🔍 The Breakthrough Insight

What finally resolved the issue?

Setting:

AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...        

Instead of relying solely on Cube-scoped variables.

Why did this work?

Because DuckDB ultimately follows the standard AWS credential resolution chain. By setting global AWS environment variables, the driver bypassed Cube’s datasource mapping entirely.

(Ps: Remember to include an environments section in your docker compose to get these env variable

Article content
docker-compose.yml

)

That confirmed the real lesson:

The problem wasn’t S3 permissions. It was how Cube injects driver configuration in multi-datasource mode.

⚠️ The Rule That Isn’t Obvious in the Docs

These two configurations are mutually exclusive:

CUBEJS_DB_TYPE=...        

and

CUBEJS_DATASOURCES=...        

You must choose one mode.

  • CUBEJS_DB_TYPE → Single-datasource mode
  • CUBEJS_DATASOURCES → Multi-datasource mode

They do not mix.


🏗️ Architectural Takeaways

If you’re running:

  • MySQL for application data
  • Redshift for warehouse analytics
  • DuckDB for S3-based computation

Then multi-datasource mode is the correct architectural decision.

But you must:

  1. Scope every datasource variable properly.
  2. Avoid mixing single-mode variables.
  3. Understand how driver-level credential resolution works.
  4. Debug from the architecture layer — not just the error message.


💡 What This Really Taught Me

Most production debugging issues are not about:

  • SQL
  • Permissions
  • IAM
  • The database

They’re about configuration mode mismatches.

Understanding how your framework internally switches operational modes is often the difference between:

  • 5 minutes of clarity
  • 5 hours of confusion


Final Thought

Modern analytics stacks are increasingly hybrid:

  • Embedded engines like DuckDB
  • Cloud storage like Amazon S3
  • Warehouses like Redshift
  • Orchestrators and modeling layers

The more composable your architecture becomes, the more important it is to understand how configuration boundaries work.

Because sometimes, the bug isn’t in your query.

It’s in your mode.


If you're working with Cube in a multi-datasource environment, I’d love to compare notes — especially around embedded engines + object storage patterns.

To view or add a comment, sign in

More articles by Ouma Winnie

Explore content categories