From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep
Amazon Redshift materialized views
From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep
Amazon Redshift materialized views
- [Instructor] It's important to take advantage of techniques to keep the costs low when running Redshift. One of these is to avoid having the cluster run the same query repeatedly, especially when the latest data isn't needed. In this lesson, we'll talk about Redshift materialized views. Some queries will use select statements that perform multiple table joins and aggregations on tables that may contain billions of rows. Processing these queries uses a lot of system resources and may take some time to compute the results. A materialized view lets you run a query in Redshift and then save the results in a table format. Then instead of running the query the next time, you just query the materialized view, this returns the results much faster and doesn't consume nearly as much of the resources. Now, you may be familiar with a standard view in SQL. With a standard view, the results aren't saved and the underlying query is run each time the view is queried. So this doesn't give the performance and efficiency gains of a materialized view, but regular views are used for simplifying complicated SQL queries for your users. A great use case for materialized view is a query that populates an analytics dashboard. Many users may view the dashboard, which could cause a query of the same data repeatedly. After you create a materialized view. Redshift can rewrite any queries on the fly to use the materialized view rather than the underlying tables. However, the materialized view is static, so if the underlying data in the tables change, the changes aren't reflected right away in the materialized view, it has to be refreshed to bring it up to date. You create a materialized view with the create materialized view command. The only thing required is the name you want to give the materialized view and the select query that adds the data to the materialized view. The other parameters are optional. For backup, you specify whether the materialized view is included in the clusters snapshots, and by default, this is set to yes. Table attributes are the distribution key and sort key that you want for the data in the materialized view. Now, this could be left up to Redshift if you don't specify it, and Redshift will use the even distribution style, which makes sense because you may not be doing any joins to materialized view because those joins were already made when creating the view. By default, Redshift won't automatically refresh your materialized view, but you can tell it to with auto refresh on. Whenever changes are made to the underlying data, Redshift will try to refresh the view, but this may not happen immediately depending upon how much a load is on the cluster. So you can force a refresh by just using the refresh materialized view command. Here's a straightforward example. Imagine that our order details and products table are frequently joined together and used in an analytics dashboard, but new order data is only loaded once per day. So we created materialized view and use that in a query for our dashboard. Since we created the view with auto refresh, this view should always be fresh. However, let's say our order load happens overnight and we want to be absolutely sure that the materialized view is current for our users the next day. In that case, we can just run the refresh command every morning on a schedule. Fortunately, Redshift lets you schedule your queries to run on a schedule very easily using the query editor.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Introduction45s
-
(Locked)
Analytics services2m 23s
-
(Locked)
Amazon Redshift5m 14s
-
(Locked)
Hands-on learning: Launch an Amazon Redshift cluster8m 22s
-
(Locked)
Amazon Redshift serverless2m 32s
-
(Locked)
Schema design for Amazon Redshift2m 36s
-
Loading data into Amazon Redshift6m 19s
-
(Locked)
Hands-on learning: Use the Amazon Redshift COPY command5m 19s
-
(Locked)
Unloading Amazon Redshift data1m 52s
-
(Locked)
Hands-on learning: Unload data to Amazon S33m 10s
-
(Locked)
Column compression2m 45s
-
(Locked)
Distribution styles5m 28s
-
(Locked)
Maintaining tables3m 41s
-
(Locked)
Amazon Redshift federated queries1m 55s
-
(Locked)
Amazon Redshift Spectrum2m 42s
-
Amazon Redshift materialized views3m 39s
-
(Locked)
Transform data with stored procedures4m 18s
-
(Locked)
Workload management1m 59s
-
(Locked)
Zero-ETL integrations3m 3s
-
(Locked)
Streaming ingestion2m 7s
-
(Locked)
Amazon Athena4m
-
(Locked)
Partitioning data3m 2s
-
(Locked)
Creating views2m 40s
-
(Locked)
Hands-on learning: Create and query tables using Athena4m 56s
-
(Locked)
AWS Lake Formation1m 58s
-
(Locked)
Hands-on learning: Create a data lake9m 9s
-
(Locked)
Amazon QuickSight4m 47s
-
(Locked)
Hands-on learning: Create a QuickSight dashboard5m 33s
-
Amazon OpenSearch7m 11s
-
(Locked)
-
-
-