Window Function Query Optimization

Cory Maklin

Published Apr 5, 2023

On a recent project, I needed to build features to contrast the average handling time of a given employee with the overall average handling time of cases within a queue.

After a bit thought and research, I came up with the following query:

SELECT
	t.*,
	AVG(handle_time_secs) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(DATE(assigned_at)) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) AS queue_aht_30days,
	AVG(handle_time_secs) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(DATE(assigned_at)) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) AS emp_queue_aht_30days
FROM project.dataset.table t

However, when I tried to run it in BigQuery, I got the following exception:

Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 158% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 98% other/unattributed: 2%

My manager suggested that I reduce the number of rows that the window function needs to iterate over by first aggregating the data by day.

The following query first obtains the date of the timestamp when the case was first assigned to the employee and then computes the total and average handle time of cases by day.

WITH features_daily_emp_queue AS 
	SELECT
	   queue_name,
	   emp_id,
	   DATE(assigned_at) AS assigned_at_date,
	   avg(handle_time_secs) emp_queue_1day_aht,
	   sum(handle_time_secs) emp_queue_1day_total_ht,
	   COUNT(1) emp_queue_1day_cases
	FROM project.dataset.table
	GROUP BY
	   queue_name,
	   emp_id,
	   assigned_at_date
)(

Then, rather than using the AVG function, we can compute the average by summing the total handle time across each day and dividing by the sum of the number of cases assigned each day.

SELEC
	(SUM(emp_queue_1day_total_ht) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) / (SUM(emp_queue_1day_cases) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS queue_aht_30days
FROM features_daily_emp_queueT

We can repeat the process; adding the employee ID to the partition.

Recommended by LinkedIn

How We Made a Query 1000x Faster by Denormalizing…

Stanley Sathler 1 year ago

🚀 Day 52 of 60 Days of SQL: Views & Materialized…

Anushka Bandara 5 months ago

🚀 Day 48 of 60 Days of SQL: Composite & Covering…

Anushka Bandara 6 months ago

SELEC
   (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) /
              (SUM(emp_queue_1day_touches) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS emp_queue_aht_30days
FROM features_daily_emp_queueT

Finally, we put it all together and join the features back to the original table using the employee ID and the assigned date.

WITH features_daily_emp_queue AS 
       SELECT
           queue_name,
           emp_id,
           DATE(assigned_at) AS assigned_at_date,
           avg(handle_time_secs) emp_queue_1day_aht,
           sum(handle_time_secs) emp_queue_1day_total_ht,
           COUNT(1) emp_queue_1day_cases
       FROM project.dataset.table
       GROUP BY
           queue_name,
           emp_id,
           assigned_at_date
   ),
   features_daily_emp_queue_window AS (
       SELECT
           queue_name,
           emp_id,
           assigned_at_date,
           (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) /
              SUM(emp_queue_1day_cases) OVER (PARTITION BY queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS queue_aht_30days,
               (SUM(emp_queue_1day_total_ht) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) /
              SUM(emp_queue_1day_cases) OVER (PARTITION BY emp_id, queue_name ORDER BY UNIX_DATE(assigned_at_date) RANGE BETWEEN 29 PRECEDING AND CURRENT ROW)) AS emp_queue_aht_30days
   )
SELECT
   t.*,
   queue_aht_30days,
   emp_queue_aht_30days
FROM project.dataset.table t
   LEFT JOIN features_daily_emp_queue_window f
	   ON t.emp_id = f.emp_id
		   AND DATE(t.assigned_at) = f.assigned_at_date(

Unlike the initial query, the preceding query ran to completion.

#dataengineering #dataengineer #datascientist #softwareengineer #development #code

Window Function Query Optimization

Cory Maklin

Recommended by LinkedIn

Big Data Walk

238 followers

More articles by Cory Maklin

Others also viewed

Day 12: Decode Data DAily--Unlocking the Power of SQL: A Deep Dive into Subqueries and Window Functions

Back to the Future with SQL

Parameterized APIs

GROUP BY - Turning Rows Into Insights

SQL Performance Faceoff: DISTINCT vs GROUP BY vs ROW_NUMBER()

Nonclustered Indexes in ClickHouse you might have missed

How to identify SQL candidates for optimization ?

Spark SQL Performance Tuning by Configurations

August 2025 - Fast updates are here, LLM observability, Gemini vs Claude SQL bake-off

SQL Day 1/50:

Explore content categories

Recommended by LinkedIn

Big Data Walk

238 followers

More articles by Cory Maklin

Ownership In A Work Context

Commodity Labor

Get More Done In Less Time

Capitalizing On Your Intellectual Capital

Approximate Join On Timestamps

Senior Data Engineer Interview: Late Arriving Data

Repository Design Pattern

Senior Data Engineer Interview: Right To Be Forgotten In Practice

Senior Data Engineer Interview: Backfilling Data

Top 5 Data Engineer Interview SQL Questions

Others also viewed

Day 12: Decode Data DAily--Unlocking the Power of SQL: A Deep Dive into Subqueries and Window Functions

Back to the Future with SQL

Parameterized APIs

GROUP BY - Turning Rows Into Insights

SQL Performance Faceoff: DISTINCT vs GROUP BY vs ROW_NUMBER()

Nonclustered Indexes in ClickHouse you might have missed

How to identify SQL candidates for optimization ?

Spark SQL Performance Tuning by Configurations

August 2025 - Fast updates are here, LLM observability, Gemini vs Claude SQL bake-off

SQL Day 1/50:

Explore content categories