Tutorial Snowflake: Query JSON files direct, using Stage and with Flatt JSON!
In this tutorial, we’ll see how to query your JSON file direct, without “copy”, only with simples queries. This approach allows you to identify adjusts in real-time and also explore your dataset
Summary:
Azure Blob Storage Set up
Upload the files in your Blob Storage container
link to download of our sample
Create your free account in the snowflake
Create the database for the tests
-- drop database raw_stage;
create database if not exists raw_stage;
Create the Stage in the Snowflake
-- drop stage if exists raw_stage.PUBLIC.stg;
CREATE STAGE raw_stage.PUBLIC.stg
url = 'azure://your-account.blob.core.windows.net/your_container'
CREDENTIALS = (AZURE_SAS_TOKEN='sp=racwdl&st=token')
Recommended by LinkedIn
List the content of the stage to check if your connection it’s ok
LIST @raw_stage.PUBLIC.stg
Create the file format JSON
-- drop file FORMAT if exists RAW_DB.PUBLIC.JSON
CREATE FILE FORMAT if not exists RAW_DB.PUBLIC.JSON
TYPE = 'JSON'
COMPRESSION = 'AUTO'
ENABLE_OCTAL = FALSE
ALLOW_DUPLICATE = FALSE
STRIP_OUTER_ARRAY = FALSE
STRIP_NULL_VALUES = FALSE
IGNORE_UTF8_ERRORS = FALSE;
Query your files JSON
SELECT
*
FROM @raw_stage.PUBLIC.stg
(
file_format => RAW_DB.PUBLIC.JSON ,
pattern => 'sample.json'
)
SELECT
$1[0]:completed
,$1[0]:title
FROM @raw_stage.PUBLIC.stg
(
file_format => RAW_DB.PUBLIC.JSON ,
pattern => 'sample.json'
)
The logic works but it’s not ready for production, let’s flat the file so we’ll have a row by JSON document. To flat the file you only need to add the flat command as below
Flatten JSON
,lateral flatten( input => parse_json($1))
The code will be like this
SELECT
value:completed::string as completed
,value:id::bigint as id
,value:title::string as title
,value:userId::bigint as user_id
FROM @raw_stage.PUBLIC.stg
(
file_format => RAW_DB.PUBLIC.JSON ,
pattern => 'sample.json'
),
lateral flatten( input => parse_json($1))
The “::” parse the result to the correct data type“:” select the element inside the array
Create a CTE to recursivity and easily use
with cte_result as (
SELECT
value:completed::string as completed
,value:id::bigint as id
,value:title::string as title
,value:userId::bigint as user_id
FROM @raw_stage.PUBLIC.stg
(
file_format => RAW_DB.PUBLIC.JSON ,
pattern => 'sample.json'
),
lateral flatten( input => parse_json($1))
)
select * from cte_result
Clean everything
remove @raw_stage.PUBLIC.stg pattern='sample.*'
drop stage if exists raw_stage.PUBLIC.stg;
drop file FORMAT if exists RAW_DB.PUBLIC.JSON
drop database raw_stage;
Repository
References:
Thank you so much, this is amazing