Transient Database in Snowflake
Create new transient DB

Transient Database in Snowflake

Transient tables in #snowflake are a weird specialty: They are normal persisting tables (unlike temporary tables), but without fail-safe and limited #timetravel capabilities.

🔗 docs.snowflake.com/en/user-guide/tables-temp-transient

However, because Snowflake doesn't apply fail-safe and time travel capabilities to them, they can be cheaper than normal permanent tables. You probably won't recognize the difference for small tables, but for a table larger than a TB, this can make a difference.

When to use transient tables?

Let me quote Serge Gershkovich here:

Transient tables are a good choice when backup and recovery are not a high priority - as in the case of staging tables. The same is true for development and test environments [...]

(Gershkovich, Serge: Data Modeling with Snowflake, 1st edition, Packt Publishing Ltd., 2023, p.51)

Transient database and schema

You probably don't want to remember adding the transient keyword to each create-table statement if you are working on a staging or dev database/schema. And you don't have to, because you can default new tables in a database/schema to be transient and just create the tables the same way you would create permanent tables:

🔗 community.snowflake.com/s/article/Making-Transient-table-by-Default

But my staging/dev database/schema already exists!

Unfortunately you can't alter a database/schema/table to become transient. But there is a neat little workaround: zero-copy-cloning

🔗 docs.snowflake.com/en/sql-reference/sql/create-clone

  1. Create a clone of your database/schema/table - the clone being transient of course (note: depending on the number of objects in the original database/schema, this might take a few seconds)
  2. Swap the clone and it's original database/schema/table
  3. Drop the original database/schema/table

Cloning will effectively re-create all the objects in a database/schema, not just tables, but also views, procedures, etc. - but the latter are not affected by the change at all. While re-creating the tables, though, the database/schema defaults to transient, and hence all permanent tables effectively are modified to become transient:

create transient database sandbox_tmp
clone sandbox;

alter database sandbox_tmp
swap with sandbox;

drop database sandbox_tmp;        

If you then check the DDL of any table in the database, you'll recognize them all to be transient now 🥳

Caution!

  1. This approach will not affect hybrid tables as they can't be transient (a table can only be of one type at any time, so yeah, of course...), those will be re-created as hybrid tables
  2. External tables and internal stages are not cloned. If you have any of those in the original database/schema, you'll have to re-create them manually

Snowflake has several such small things which literally solved several usecases which I thought Oracle will fix being in this area from decades. Snowflake users should use all its features. Developers will also enjoy implementing it instead of lift and shift.

Bravo Martin. Very well explained and practical!

I can recommend to use transient tables/schemas/databases if possible. In our mds setup we actually only need time travel/recovery in the ingestion level. If you have scalable full-refresh models you can save many many TB of unnecessary cloud storage.

To view or add a comment, sign in

More articles by Martin Seifert

  • Snowflake Dynamic Tables

    Since summer time is a little more quiet I finally had a chance to play around with Snowflake Dynamic Tables (DTs)…

    1 Comment
  • Chat with your Snowflake Data and LLMs

    Snowflake provides a great Quickstart guide on how to set up a RAG based LLM assistant via Snowflake Cortex (shoutout…

    4 Comments
  • Multi-Input Lookup Variable in GTM

    Lookup Tables (as well as Firestore Lookups) in (client- and server-side) GTM are useful if you have a single key/input…

  • Pub/Sub messages to Snowflake

    Pub/Sub is an asynchronous and scalable messaging service that decouples services producing messages from services…

    2 Comments
  • Server-side tag-management and consent

    Collecting #consent is a necessity and there are plenty of ways to get it (or at least to ask for it). However…

  • Parent-child-hierarchy in Snowflake and Tableau

    What is a parent-child-hierarchy? There are multiple possibilities for hierarchical dimensions (balanced, unbalanced…

    3 Comments
  • Gateway Server for Data Factory

    Self-hosted integration runtimes are #azuredatafactory's way of connecting cloud systems (like #adf itself or Azure…

Others also viewed

Explore content categories