Sharing data across redshift clusters (Datashare):


Creating datashare in redshift clusters :


Introduction:

Every organization has multiple teams – departments which often have to share data in order to promote unified and accurate decisions about their product or services. Within a classic Data Warehouse, Data Sharing can be time consuming, stressful and often dependent on DBAs additional tooling. Additionally, when it comes to accurate testing, users often have to move data themselves or announce well ahead when they will need that data so that DBAs can prepare a testing environment for them.

Finally, AWS has come up with a solution. Amazon Data Sharing is now generally available and with this feature users can display data created in one cluster to multiple other clusters without any data movement or replication systems you need to buy and set up. Although this is a feature that AWS provides and markets as very simple to use, there are a few things you need to know to set it up and configure correctly, and this article aims to explain how to do so properly


Commands to Work with Amazon Data Share:

Now that you’ve explored the basics of Amazon Redshift Data Sharing, you can start learning how to use the feature. There are certain processes Redshift users must master to share data effectively. These processes are:

  • Create Datashare
  • Alter Datashare
  • Desc Datashare
  • Show Datashare
  • Drop Datashare

----- Step 1 : Create Datashare

The first step to sharing data is to create a datashare. You can create a datashare by entering the following syntax within an Amazon Redshift database:

CREATE DATASHARE datashare_name;        
[[SET] PUBLICACCESSIBLE [=] TRUE | FALSE ];        

The parameter, [ [SET] PUBLICACCESSIBLE, states whether you can share the data with clusters that are publicly accessible. This should always be set to FALSE. dont see any reason for you to share publicly.

For example:

create datashare sales_datashare;        

--- Step 2 : share schema level all objects in schema to consumer.

Set permissions on the database objects that you want to share. For example:

First add schema to datashare, so you can then add objects

alter datashare mysource add schema abcd ;

You have an option to add one object or all tables/Views in schema to datashare :

For example:

ALTER DATASHARE salesshare ADD ALL TABLES IN SCHEMA abcd;        

----- Step 3 : To add/remove objects to datashare

  • Alter Datashare

This function allows you to add or remove objects from a datashare. The syntax for this process is:

ALTER DATASHARE datashare_name ADD TABLE schemaname.table_name;        
Or        
ALTER DATASHARE datashare_name REMOVE TABLE schemaname.table_name;        

For example:

ALTER DATASHARE salesshare ADD TABLE abcd.table_name;        

Or

ALTER DATASHARE salesshare REMOVE TABLE abcd.table_name;        

Note : If you want to ensure that your datashare should include all the future tables that are create under the schema. You should set paarmeter includenew = true;

[ SET INCLUDENEW [=] TRUE | FALSE FOR SCHEMA schema ]        

A clause that specifies whether to add any future tables, views, or SQL user-defined functions (UDFs) created in the specified schema to the datashare. Current tables, views, or SQL UDFs in the specified schema aren't added to the datashare. Only superusers can change this property for each datashare-schema pair. By default, the INCLUDENEW clause is false.

  • Desc Datashare

This shows all the objects added to a datashare. The syntax for Desc Datashare is:

DESC DATASHARE datashare_name [ OF [ ACCOUNT account_id ] NAMESPACE namespace_guid ]        

Account_id indicates the account where the datashare was created.

Namespace_guid is a code number for the datashare.

  • Show Datashare

Use this function to view the inbound and outbound datashares within a cluster. Here’s how to request Amazon Redshift to show datashares:

SHOW DATASHARES [ LIKE 'namepattern' ]        

Namepattern refers to the similar characters that all the requested datashares have.

LIKE is an optional clause that matches the name pattern with the description of the datashares within an account.

  • Drop Datashare

This deletes a datashare object from a cluster. The syntax for Drop Datashare is:

DROP DATASHARE datashare_name;        

Amazon Redshift Data Sharing Use Cases

Now that you have gained a basic understanding of Amazon Redshift Data Sharing capability, below are some of the use cases listed where this feature is commonly used.

---- step 4 : grant usage to consumer cluster

Now to allow a consumer to access data, we need to grant usage on dtashare to that AWS account's Namespace. Allow permissions on the consumer cluster namespace to access the datashare.

For example:

grant usage on datashare salesshare to namespace '2b12345-1234-5678-9012-bb1234567890';        


-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

ON AWS Console We have to authorize the producer and consumer to allow data flow between the clusters

---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

There is an important step that needs to be done manually on the aws console datashares tab.

Authorize On Producer AWS Console.

we need to authorize the producer by clicking on

datashare >> datashare name >> select authorize.


Associate on Consumer AWS Console.

we need to associate the producer by clicking on datashare name and select associate in

datashares >> from other accounts. >> datasharename >> associate region/namespace.



---------- On Consumer(cluster) side ----------

On the consumer cluster, you create a database from the datashare. These steps describe how to share data between two clusters in the same account. For information on sharing data across AWS accounts, see Sharing data across AWS accounts in the Amazon Redshift Database Developer Guide.

You can use SQL commands or the query editor v2 tree-view panel to create the database.

---- step 1 : Create a database that will hold the producer database info

Create a database from the datashare for your account and the namespace of the producer cluster. For example:

For example use SQL :

create database sales_shared_db from datashare salesshare of account '123456789012' namespace 'p1234567-8765-4321-p10987654321';         

Set permissions so that users can access the database and the schema.

---- step 2 : Create external schemas

Create external schemas under your local database based on your shared database.

CREATE EXTERNAL SCHEMA 'local_schema_name' FROM REDSHIFT DATABASE 'sales_shared_db' SCHEMA 'shared_schema_name';        

In our case that would be:

CREATE EXTERNAL SCHEMA 'sales_ext_schema' FROM REDSHIFT DATABASE 'sales_shared_db' SCHEMA 'abcd';        

Note: For this option you won't need to create any additional objects (tables, views) of this schema since they will be automatically created.

-- Step 3 : grant access on external schema to :

grant usage on database sales_shared_db to usernames/groups/roles;        
grant usage on schema sales_ext_schema to usernames/groups/roles;        

Set permissions so that users/groups can access the database and the schema.


Thanks for this document. It helped to complete the datashare between two clusters in same region in different account. My clarification is that whether is it possible to create the datashare between to different regions ??? for example (producer in singapore region and consumer in frankfrut)

Like
Reply

Hi, it's great document. I have clarification regarding for authorize the producer. Whether it has to be done manually only or sql query to give authorize for the pending authorization in producer account (datashare).

Like
Reply

To view or add a comment, sign in

More articles by Arvind Toorpu

Others also viewed

Explore content categories