Sharing data across redshift clusters (Datashare):

Arvind Toorpu

Published Nov 18, 2023

Creating datashare in redshift clusters :

Introduction:

Every organization has multiple teams – departments which often have to share data in order to promote unified and accurate decisions about their product or services. Within a classic Data Warehouse, Data Sharing can be time consuming, stressful and often dependent on DBAs additional tooling. Additionally, when it comes to accurate testing, users often have to move data themselves or announce well ahead when they will need that data so that DBAs can prepare a testing environment for them.

Finally, AWS has come up with a solution. Amazon Data Sharing is now generally available and with this feature users can display data created in one cluster to multiple other clusters without any data movement or replication systems you need to buy and set up. Although this is a feature that AWS provides and markets as very simple to use, there are a few things you need to know to set it up and configure correctly, and this article aims to explain how to do so properly

Commands to Work with Amazon Data Share:

Now that you’ve explored the basics of Amazon Redshift Data Sharing, you can start learning how to use the feature. There are certain processes Redshift users must master to share data effectively. These processes are:

Create Datashare
Alter Datashare
Desc Datashare
Show Datashare
Drop Datashare

----- Step 1 : Create Datashare

The first step to sharing data is to create a datashare. You can create a datashare by entering the following syntax within an Amazon Redshift database:

CREATE DATASHARE datashare_name;

[[SET] PUBLICACCESSIBLE [=] TRUE | FALSE ];

The parameter, [ [SET] PUBLICACCESSIBLE, states whether you can share the data with clusters that are publicly accessible. This should always be set to FALSE. dont see any reason for you to share publicly.

For example:

create datashare sales_datashare;

--- Step 2 : share schema level all objects in schema to consumer.

Set permissions on the database objects that you want to share. For example:

First add schema to datashare, so you can then add objects

alter datashare mysource add schema abcd ;

You have an option to add one object or all tables/Views in schema to datashare :

For example:

ALTER DATASHARE salesshare ADD ALL TABLES IN SCHEMA abcd;

----- Step 3 : To add/remove objects to datashare

Alter Datashare

This function allows you to add or remove objects from a datashare. The syntax for this process is:

ALTER DATASHARE datashare_name ADD TABLE schemaname.table_name;

Or

ALTER DATASHARE datashare_name REMOVE TABLE schemaname.table_name;

For example:

ALTER DATASHARE salesshare ADD TABLE abcd.table_name;

ALTER DATASHARE salesshare REMOVE TABLE abcd.table_name;

Note : If you want to ensure that your datashare should include all the future tables that are create under the schema. You should set paarmeter includenew = true;

[ SET INCLUDENEW [=] TRUE | FALSE FOR SCHEMA schema ]

A clause that specifies whether to add any future tables, views, or SQL user-defined functions (UDFs) created in the specified schema to the datashare. Current tables, views, or SQL UDFs in the specified schema aren't added to the datashare. Only superusers can change this property for each datashare-schema pair. By default, the INCLUDENEW clause is false.

Desc Datashare

This shows all the objects added to a datashare. The syntax for Desc Datashare is:

DESC DATASHARE datashare_name [ OF [ ACCOUNT account_id ] NAMESPACE namespace_guid ]

Account_id indicates the account where the datashare was created.

Namespace_guid is a code number for the datashare.

Show Datashare

Use this function to view the inbound and outbound datashares within a cluster. Here’s how to request Amazon Redshift to show datashares:

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

ON AWS Console We have to authorize the producer and consumer to allow data flow between the clusters

---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

There is an important step that needs to be done manually on the aws console datashares tab.

Authorize On Producer AWS Console.

we need to authorize the producer by clicking on

datashare >> datashare name >> select authorize.

Associate on Consumer AWS Console.

we need to associate the producer by clicking on datashare name and select associate in

datashares >> from other accounts. >> datasharename >> associate region/namespace.

---------- On Consumer(cluster) side ----------

On the consumer cluster, you create a database from the datashare. These steps describe how to share data between two clusters in the same account. For information on sharing data across AWS accounts, see Sharing data across AWS accounts in the Amazon Redshift Database Developer Guide.

You can use SQL commands or the query editor v2 tree-view panel to create the database.

---- step 1 : Create a database that will hold the producer database info

Create a database from the datashare for your account and the namespace of the producer cluster. For example:

For example use SQL :

create database sales_shared_db from datashare salesshare of account '123456789012' namespace 'p1234567-8765-4321-p10987654321';

Set permissions so that users can access the database and the schema.

---- step 2 : Create external schemas

Create external schemas under your local database based on your shared database.

CREATE EXTERNAL SCHEMA 'local_schema_name' FROM REDSHIFT DATABASE 'sales_shared_db' SCHEMA 'shared_schema_name';

In our case that would be:

CREATE EXTERNAL SCHEMA 'sales_ext_schema' FROM REDSHIFT DATABASE 'sales_shared_db' SCHEMA 'abcd';

Note: For this option you won't need to create any additional objects (tables, views) of this schema since they will be automatically created.

-- Step 3 : grant access on external schema to :

grant usage on database sales_shared_db to usernames/groups/roles;

grant usage on schema sales_ext_schema to usernames/groups/roles;

Set permissions so that users/groups can access the database and the schema.

Elangkumaran Baladhandapani 1y

Thanks for this document. It helped to complete the datashare between two clusters in same region in different account. My clarification is that whether is it possible to create the datashare between to different regions ??? for example (producer in singapore region and consumer in frankfrut)

Elangkumaran Baladhandapani 1y

Hi, it's great document. I have clarification regarding for authorize the producer. Whether it has to be done manually only or sql query to give authorize for the pending authorization in producer account (datashare).

See more comments

To view or add a comment, sign in

Sharing data across redshift clusters (Datashare):

Arvind Toorpu

Creating datashare in redshift clusters :

Introduction:

----- Step 1 : Create Datashare

--- Step 2 : share schema level all objects in schema to consumer.

----- Step 3 : To add/remove objects to datashare

Recommended by LinkedIn

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

ON AWS Console We have to authorize the producer and consumer to allow data flow between the clusters

---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Authorize On Producer AWS Console.

Associate on Consumer AWS Console.

---------- On Consumer(cluster) side ----------

---- step 2 : Create external schemas

-- Step 3 : grant access on external schema to :

More articles by Arvind Toorpu

Others also viewed

Connecting Azure Data Lake Storage with Databricks

Data Analytics with AWS Redshift and Redshift Spectrum: A Scenario-Based Approach

Amazon Redshift - A Doppler effect for Data

Getting Started With Big Data On GCP: A Beginner's Guide To Understanding The Basics Of Big Data And How To Leverage GCP Services In Big Data

Azure Data & Analytics Updates - June 2020

How to test Azure Data Pipeline?

Building a High-Performance Data Analytics Service with Apache Arrow Flight and DuckDB and S3 Tables

The Hitchhiker’s Guide to AWS Redshift — part 2: Let’s cut costs

The Big Data Technology Stack

A Simple, yet effective way for improved query performance in Redshift - Overcome the challenge of a single Distribution key

Explore content categories

Creating datashare in redshift clusters :

Introduction:

----- Step 1 : Create Datashare

--- Step 2 : share schema level all objects in schema to consumer.

----- Step 3 : To add/remove objects to datashare

Recommended by LinkedIn

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

ON AWS Console We have to authorize the producer and consumer to allow data flow between the clusters

---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Authorize On Producer AWS Console.

Associate on Consumer AWS Console.

---------- On Consumer(cluster) side ----------

---- step 2 : Create external schemas

-- Step 3 : grant access on external schema to :

More articles by Arvind Toorpu

Oracle 23c AI’s JSON Relational Duality

Amazon Redshift Workload Management (WLM): A Step-by-Step Guide

Refreshing a SQL Server database in AWS

AWS bulk tagging your resources

How to upgrade TimescaleDB from 1.6/1.7 to 2.x version on Postgresql 11.x

configure complex password (password verify function) in oracle database

Others also viewed

Connecting Azure Data Lake Storage with Databricks

Data Analytics with AWS Redshift and Redshift Spectrum: A Scenario-Based Approach

Amazon Redshift - A Doppler effect for Data

Getting Started With Big Data On GCP: A Beginner's Guide To Understanding The Basics Of Big Data And How To Leverage GCP Services In Big Data

Azure Data & Analytics Updates - June 2020

How to test Azure Data Pipeline?

Building a High-Performance Data Analytics Service with Apache Arrow Flight and DuckDB and S3 Tables

The Hitchhiker’s Guide to AWS Redshift — part 2: Let’s cut costs

The Big Data Technology Stack

A Simple, yet effective way for improved query performance in Redshift - Overcome the challenge of a single Distribution key

Explore content categories