Join now Sign in

From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep

Amazon Redshift materialized views

From the course: AWS Certified Data Engineer Associate (DEA-C01) Cert Prep

Start my 1-month free trial Buy for my team

Amazon Redshift materialized views

“

- [Instructor] It's important to take advantage of techniques to keep the costs low when running Redshift. One of these is to avoid having the cluster run the same query repeatedly, especially when the latest data isn't needed. In this lesson, we'll talk about Redshift materialized views. Some queries will use select statements that perform multiple table joins and aggregations on tables that may contain billions of rows. Processing these queries uses a lot of system resources and may take some time to compute the results. A materialized view lets you run a query in Redshift and then save the results in a table format. Then instead of running the query the next time, you just query the materialized view, this returns the results much faster and doesn't consume nearly as much of the resources. Now, you may be familiar with a standard view in SQL. With a standard view, the results aren't saved and the underlying query is run each time the view is queried. So this doesn't give the performance and efficiency gains of a materialized view, but regular views are used for simplifying complicated SQL queries for your users. A great use case for materialized view is a query that populates an analytics dashboard. Many users may view the dashboard, which could cause a query of the same data repeatedly. After you create a materialized view. Redshift can rewrite any queries on the fly to use the materialized view rather than the underlying tables. However, the materialized view is static, so if the underlying data in the tables change, the changes aren't reflected right away in the materialized view, it has to be refreshed to bring it up to date. You create a materialized view with the create materialized view command. The only thing required is the name you want to give the materialized view and the select query that adds the data to the materialized view. The other parameters are optional. For backup, you specify whether the materialized view is included in the clusters snapshots, and by default, this is set to yes. Table attributes are the distribution key and sort key that you want for the data in the materialized view. Now, this could be left up to Redshift if you don't specify it, and Redshift will use the even distribution style, which makes sense because you may not be doing any joins to materialized view because those joins were already made when creating the view. By default, Redshift won't automatically refresh your materialized view, but you can tell it to with auto refresh on. Whenever changes are made to the underlying data, Redshift will try to refresh the view, but this may not happen immediately depending upon how much a load is on the cluster. So you can force a refresh by just using the refresh materialized view command. Here's a straightforward example. Imagine that our order details and products table are frequently joined together and used in an analytics dashboard, but new order data is only loaded once per day. So we created materialized view and use that in a query for our dashboard. Since we created the view with auto refresh, this view should always be fresh. However, let's say our order load happens overnight and we want to be absolutely sure that the materialized view is current for our users the next day. In that case, we can just run the refresh command every morning on a schedule. Fortunately, Redshift lets you schedule your queries to run on a schedule very easily using the query editor.

Contents