#39 – FIRST_VALUE() and LAST_VALUE() window functions in Apache Spark SQL

Sergey Senigov

Published Oct 12, 2025

Hi friends!

Say we want to get first or last value in a group according to a certain criteria, and then use it further in the analysis. This can be accomplished for example by means of window function ROW_NUMBER() with result condition "= 1" .

But it is more concise to use FIRST_VALUE() or LAST_VALUE() window functions.

Let's see how FIRST_VALUE() works. LAST_VALUE() does is the same except the order of records.

The FIRST_VALUE() window function is used in SQL to retrieve the first value within a specified partition or result set based on ordering criteria. It’s particularly useful for scenarios where you need to extract specific values from sorted rows without needing subqueries or joins.

For example our task is to calculate gaps in days between order dates and the first order's date in each month. So we want first to get dates of the first order in each month and then calculate gaps for each successive order.

Let's generate test orders data with two columns: order_id and order_date. The dates belong to two months.

FIRST_VALUE() and LAST_VALUE() window functions – test data

Now add common table expression (CTE) with calculated "order_mon" and using "partition by order_mon" get date of first order in each month

FIRST_VALUE() and LAST_VALUE() window functions – get first value

You may have notice that in the code use first() but not first_value() – they are synonyms in this case as well as last_value() and last().

It's almost done! Now add calculated column "days_diff" with gap in days

FIRST_VALUE() and LAST_VALUE() window functions – add calculated column with gap in days

We can summary the analysis like this

FIRST_VALUE() and LAST_VALUE() window functions – analysis summary

The notebook code is here.

Stay tuned!

#39 – FIRST_VALUE() and LAST_VALUE() window functions in Apache Spark SQL

Sergey Senigov

Apache Spark Sunday Shorts

186 followers

More articles by Sergey Senigov

Explore content categories

Apache Spark Sunday Shorts

186 followers

More articles by Sergey Senigov

#44 – The LISTAGG() function in Apache Spark SQL

#43 – The ELT() SQL function

#42 – Why Apache Spark Datasets are immutable? Part 2

#41 – What does it mean that Apache Spark SQL Datasets are immutable?

#40 – What is "Spark Catalog" in Apache Spark

#38 – Snowflake vs Databricks: key features and differences

#37 – Python Data Source: A New Feature in Apache Spark 4

# 36 – Schema-on-Read vs Schema-on-Write in Apache Hive

#35 – Compare nullables using "is distinct from" clause

#34 – The Basics of Databricks platform. Part 1

Explore content categories