Tutorial Snowflake: Query JSON files direct, using Stage and with Flatt JSON!

Tutorial Snowflake: Query JSON files direct, using Stage and with Flatt JSON!

In this tutorial, we’ll see how to query your JSON file direct, without “copy”, only with simples queries. This approach allows you to identify adjusts in real-time and also explore your dataset

Summary:

  • Create a container in Azure
  • Create a token in the Blob Storage
  • Upload the files in your Blob Storage
  • Create a Stage in the Snowflake
  • Create the file format JSON in the Snowflake
  • Query your JSON file like a table
  • Flatten your JSON to production

Azure Blob Storage Set up

  • Go to storages account
  • Create a container

No alt text provided for this image

  • Click in the container name

No alt text provided for this image

  • Click in the Shared Access Signature link
  • Give all the permissions for the token

No alt text provided for this image
No alt text provided for this image


Upload the files in your Blob Storage container

link to download of our sample


No alt text provided for this image


Create your free account in the snowflake

Snowflake Trial

No alt text provided for this image


Create the database for the tests

-- drop database raw_stage; 

create database if not exists raw_stage;        


Create the Stage in the Snowflake

  • Use the token generated in the previous step

-- drop stage if exists raw_stage.PUBLIC.stg;
CREATE STAGE raw_stage.PUBLIC.stg
url = 'azure://your-account.blob.core.windows.net/your_container'
CREDENTIALS = (AZURE_SAS_TOKEN='sp=racwdl&st=token')        
No alt text provided for this image

List the content of the stage to check if your connection it’s ok

LIST @raw_stage.PUBLIC.stg        
No alt text provided for this image

Create the file format JSON

-- drop file FORMAT if exists RAW_DB.PUBLIC.JSON
CREATE FILE FORMAT if not exists RAW_DB.PUBLIC.JSON 
TYPE = 'JSON' 
COMPRESSION = 'AUTO' 
ENABLE_OCTAL = FALSE 
ALLOW_DUPLICATE = FALSE 
STRIP_OUTER_ARRAY = FALSE 
STRIP_NULL_VALUES = FALSE 
IGNORE_UTF8_ERRORS = FALSE;        

Query your files JSON

SELECT
    *
FROM @raw_stage.PUBLIC.stg
(
  file_format => RAW_DB.PUBLIC.JSON ,
  pattern => 'sample.json'
)        
No alt text provided for this image
SELECT
     $1[0]:completed
    ,$1[0]:title
FROM @raw_stage.PUBLIC.stg
(
  file_format => RAW_DB.PUBLIC.JSON ,
  pattern => 'sample.json'
)        
No alt text provided for this image

The logic works but it’s not ready for production, let’s flat the file so we’ll have a row by JSON document. To flat the file you only need to add the flat command as below

Flatten JSON

,lateral flatten( input => parse_json($1))        

The code will be like this

SELECT
      value:completed::string as completed
     ,value:id::bigint as id
     ,value:title::string as title
     ,value:userId::bigint as user_id
FROM @raw_stage.PUBLIC.stg
(
  file_format => RAW_DB.PUBLIC.JSON ,
  pattern => 'sample.json'
),
lateral flatten( input => parse_json($1))        

The “::” parse the result to the correct data type“:” select the element inside the array

No alt text provided for this image


Create a CTE to recursivity and easily use

with cte_result as (
SELECT
      value:completed::string as completed
     ,value:id::bigint as id
     ,value:title::string as title
     ,value:userId::bigint as user_id
FROM @raw_stage.PUBLIC.stg
(
  file_format => RAW_DB.PUBLIC.JSON ,
  pattern => 'sample.json'
),
lateral flatten( input => parse_json($1))
)
select * from cte_result        

Clean everything

remove @raw_stage.PUBLIC.stg pattern='sample.*'
drop stage if exists raw_stage.PUBLIC.stg;
drop file FORMAT if exists RAW_DB.PUBLIC.JSON
drop database raw_stage;        

Repository

https://github.com/silllvio/snowflake-direct-query-files


References:


Thank you so much, this is amazing

To view or add a comment, sign in

More articles by Silvio L.

Others also viewed

Explore content categories