Few Redshift SQL hacks
I have been working with Redshift nowadays and would like to present few hacks to optimize and run some out of the box queries in Redshift.
LEAD(<column name> IGNORE NULLS)
Use this analytical function if you would like to fetch the next NOT NULL value of the column. LEAD function without the IGNORE NULLS would fetch the next value irrespective of the value being NULL or not, so this is quite handy to fetch the next NOT NULL only value. Similarly to LEAD you can use the IGNORE NULLS in the LAG function, that would fetch the previous NOT NULL value in the column
CTE vs Temp Tables
Breaking down a complex query with multiple CTE’s to temp tables generally resulted in an increased performance. The temp tables can also be made into intermediate tables and then dropped later after the entire flow is run.
Correlated Subquery with CTE
In few instances Redshift will not allow you to write a correlated subquery that references a CTE when using some clauses like EXISTS or NOT EXISTS. In this instance you can write the CTE to a temp table or intermediate table and then use it in the correlated subquery.
Using Sort Key vs Not using it
Sort keys are useful only when the columns in the key are used later in filtering the queries. However, creating a table with Sort keys takes more time than creating one without sort keys. So make use of Sort keys only when necessary.
Typecasting NULL to a datatype
Though it is not confirmed in Redshift docs, NULL sometimes resolves to varchar datatype, but you can typecast it to a desired datatype such as integer using syntax like NULL::BIGINT. This hack can come handy particularly when you are doing a union all of two queries where there are columns that are NULL defined in one query and carry data in another query.
Concatenate DATE and TIME to DATETIME
We can just use the ‘+’ operator to concatenate DATE and TIME columns to make a DATETIME column. No extra typecasting or other DATETIME functions are required to make a DATETIME column from DATE and TIME.
Using short notation in DATETIME functions
We can just use short hand literals like ‘s’, ‘m’, ‘y’ etc for SECOND, MINUTE and YEAR when you want to use them in datetime functions. So a fucntion like DATEDIFF(‘s’, start_time, end_time) will be equal to DATEDIFF(SECOND, start_time, end_time)