How to use Snowflake's cloning feature with Trifacta to have 0 downtime for your EDW data refreshes

Snowflake calls it “Magic with MetaData” and rightly so. We have found this feature extremely useful and have incorporated it in our Transformation phase when we are cleansing, unifying, wrangling data from Staging and sending it our EDW with Trifacta.

No alt text provided for this image

Wanna know how to have 0 downtime for your Snowflake EDW data refresh even though your data pipelines bring in new data all day long and your transformation flows running multiple times per day? USE SNOWFLAKE CLONING feature. I am going over how we at InhabitIQ implemented it:

  • Create a clone of the table you want to insert data into
No alt text provided for this image

This way you end up with 2 identical tables pointing to the same metadata

No alt text provided for this image
  • Have your Transformation process (Trifacta flows in our case) APPEND into your clone table:
No alt text provided for this image
  • Run post load scripts on clone table (retiring old rows for SCD Type 2, setting metadata , ...). We use Snwoflake's scheduled tasks to execute our scripts. In the example query below, we are looking for rows that have multiple current version (IS_CURRENT = ‘TRUE') and then set the flag for the old row to FALSE.
No alt text provided for this image
  • Write back the clone table into your main table
No alt text provided for this image

This way we are able to wrangle data all day long with no downtime

--

Misagh Jebeli is BI Architect at Inhabit IQ. InhabitIQ’s suite of products cover many of the needs in the affordable, single-family, and multi-family residential spaces, student housing, and commercial as well as short-term rental markets in the United States, Canada, and the UK. 

To view or add a comment, sign in

More articles by Misagh Jebeli

Others also viewed

Explore content categories