Architecting Data Migration: Efficiently Loading Data from Oracle (On-Premises) to Snowflake

Architecting Data Migration: Efficiently Loading Data from Oracle (On-Premises) to Snowflake

Loading Data from Oracle (On-Premises) to Snowflake: A Step-by-Step Guide

As a Snowflake Architect, the process of loading data from an on-premises Oracle database into Snowflake typically involves several stages—data extraction, file storage, and data loading. This guide outlines the key steps and provides a technical framework to move data from an Oracle database (e.g., a customer table with five columns) into a Snowflake environment.

1. Preliminary Setup and Assumptions

Prerequisites:

  • Oracle Database: Ensure that the Oracle database is accessible, with valid credentials and network permissions.
  • Snowflake Account: Valid Snowflake credentials (user, password, account URL, role, warehouse, and database).
  • ODBC/JDBC Drivers: Ensure the correct Oracle ODBC/JDBC drivers are installed if using any intermediate tools or scripts for data extraction.


2. Create a Stage in Snowflake

Before loading the data into Snowflake, the data needs to be staged in a temporary location such as S3 (for external stages) or an internal Snowflake stage. The stage provides a location where data is stored temporarily before loading it into the destination table.

Example: Create an S3 External Stage

sql


CREATE STAGE my_stage

  URL = 's3://my-bucket-name/my-path/'

  CREDENTIALS = (AWS_KEY_ID = '<AWS_KEY>' AWS_SECRET_KEY = '<AWS_SECRET>');

Alternatively, for an internal stage, you can omit the URL and use Snowflake's internal storage.

3. Extract Data from Oracle

The data extraction step typically involves exporting data from the Oracle database into a file format that Snowflake can read, such as CSV or Parquet. This is often done using tools like Oracle SQL*Plus, SQLcl, or ETL tools.

Using SQL*Plus to Export Data to CSV:

sql

SPOOL /path/to/your/output_file.csv

SELECT * FROM customer;

SPOOL OFF;

Alternatively, if using a more sophisticated ETL tool like Oracle Data Integrator (ODI), Apache Nifi, or Informatica, configure the tool to extract data from the customer table into the desired file format and upload it directly to the staging area.

4. Upload Data to Snowflake Stage

After extracting the data to a file, the next step is to upload it to the staging area in S3 or Snowflake's internal stage. This can be done using AWS CLI or Snowflake's Web UI.

For instance, using AWS CLI to upload to S3:

aws s3 cp /path/to/your/output_file.csv s3://my-bucket-name/my-path/

If you're using an internal stage, you can directly upload the file into the Snowflake internal storage via the Snowflake UI or PUT command.


5. Define the Target Table in Snowflake

The target table in Snowflake must mirror the structure of the Oracle source table to ensure the data loads correctly.

Example: Create the customer table in Snowflake:

sql

CREATE TABLE customer (

    customer_id INT,

    first_name STRING,

    last_name STRING,

    email STRING,

    phone_number STRING

);

Ensure that the data types in Snowflake are aligned with the source table in Oracle (for example, STRING in Snowflake for VARCHAR in Oracle).


6. Load Data from the Stage into Snowflake Table

After the data has been uploaded to the staging area, the next step is to load the data into the target table. Snowflake uses the COPY INTO command to efficiently load large volumes of data from external or internal stages into the Snowflake table.

Example: Loading CSV Data Using COPY INTO

sql

COPY INTO customer

FROM @my_stage/output_file.csv

FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"')

ON_ERROR = 'CONTINUE';  -- Optional: Skips records with errors

  • FILE_FORMAT: Specifies the format of the file being loaded. In this case, it's a CSV file, and we have specified the optional field enclosure (e.g., ") to handle potential special characters.
  • ON_ERROR: Optional parameter to handle loading errors. This can be set to CONTINUE, SKIP_FILE, or ABORT_STATEMENT depending on your error tolerance.

If you are loading a Parquet or JSON file, you would adjust the FILE_FORMAT parameter accordingly.


7. Data Verification in Snowflake

Once the data is loaded into Snowflake, it’s essential to verify the load and perform basic checks:

sql

SELECT * FROM customer;

You can run additional validation or transformation queries to ensure that the data integrity has been maintained during the transfer.


8. Post-Processing (Optional)

Post-processing can include tasks such as:

  • Automating the Data Load: Use Snowflake Streams and Tasks to automate data ingestion or transformations in real-time.
  • Clean-Up: If the data was loaded into an external stage (e.g., S3), you may wish to clean up the files after the load to save storage costs.

Example for cleaning up a Snowflake stage:

sql

DROP STAGE my_stage;


Summary of Steps:

  1. Create a Stage: Set up an external or internal stage in Snowflake.
  2. Extract Data from Oracle: Use tools like SQL*Plus or an ETL tool to export the Oracle data to a flat file.
  3. Upload Data: Upload the extracted data file to Snowflake’s stage (either internal or external storage).
  4. Create Table in Snowflake: Define a schema that mirrors the Oracle table structure in Snowflake.
  5. Load Data: Use the COPY INTO command to load data from the stage into the Snowflake table.
  6. Verify Load: Run validation queries to ensure data integrity.
  7. Post-Processing: Optionally automate and clean up.


This process ensures the efficient transfer of data from Oracle (on-premises) to Snowflake with minimal risk of data loss or corruption, leveraging Snowflake’s powerful capabilities for scalable and cost-efficient data loading.

Let me know if you'd like further details on any of the steps!

 

this is quite insightful

Like
Reply

To view or add a comment, sign in

More articles by Nagaraju Kendyala

Others also viewed

Explore content categories