How to use Snowflake's cloning feature with Trifacta to have 0 downtime for your EDW data refreshes
Snowflake calls it “Magic with MetaData” and rightly so. We have found this feature extremely useful and have incorporated it in our Transformation phase when we are cleansing, unifying, wrangling data from Staging and sending it our EDW with Trifacta.
Wanna know how to have 0 downtime for your Snowflake EDW data refresh even though your data pipelines bring in new data all day long and your transformation flows running multiple times per day? USE SNOWFLAKE CLONING feature. I am going over how we at InhabitIQ implemented it:
- Create a clone of the table you want to insert data into
This way you end up with 2 identical tables pointing to the same metadata
- Have your Transformation process (Trifacta flows in our case) APPEND into your clone table:
- Run post load scripts on clone table (retiring old rows for SCD Type 2, setting metadata , ...). We use Snwoflake's scheduled tasks to execute our scripts. In the example query below, we are looking for rows that have multiple current version (IS_CURRENT = ‘TRUE') and then set the flag for the old row to FALSE.
- Write back the clone table into your main table
This way we are able to wrangle data all day long with no downtime
--
Misagh Jebeli is BI Architect at Inhabit IQ. InhabitIQ’s suite of products cover many of the needs in the affordable, single-family, and multi-family residential spaces, student housing, and commercial as well as short-term rental markets in the United States, Canada, and the UK.