#39 – FIRST_VALUE() and LAST_VALUE() window functions in Apache Spark SQL

#39 – FIRST_VALUE() and LAST_VALUE() window functions in Apache Spark SQL

Hi friends!

Say we want to get first or last value in a group according to a certain criteria, and then use it further in the analysis. This can be accomplished for example by means of window function ROW_NUMBER() with result condition "= 1" .

But it is more concise to use FIRST_VALUE() or LAST_VALUE() window functions.

Let's see how FIRST_VALUE() works. LAST_VALUE() does is the same except the order of records.

The FIRST_VALUE() window function is used in SQL to retrieve the first value within a specified partition or result set based on ordering criteria. It’s particularly useful for scenarios where you need to extract specific values from sorted rows without needing subqueries or joins.

For example our task is to calculate gaps in days between order dates and the first order's date in each month. So we want first to get dates of the first order in each month and then calculate gaps for each successive order.

Let's generate test orders data with two columns: order_id and order_date. The dates belong to two months.

FIRST_VALUE() and LAST_VALUE() window functions – test data
FIRST_VALUE() and LAST_VALUE() window functions – test data

Now add common table expression (CTE) with calculated "order_mon" and using "partition by order_mon" get date of first order in each month

FIRST_VALUE() and LAST_VALUE() window functions – get first value

You may have notice that in the code use first() but not first_value() – they are synonyms in this case as well as last_value() and last().

It's almost done! Now add calculated column "days_diff" with gap in days

FIRST_VALUE() and LAST_VALUE() window functions  – add calculated column with gap in days

We can summary the analysis like this

FIRST_VALUE() and LAST_VALUE() window functions – analysis summary

The notebook code is here.

Stay tuned!

To view or add a comment, sign in

More articles by Sergey Senigov

Explore content categories