Does the SQL Language Need a Pipe (|) Operator?

Jules Damji

Published May 11, 2025

SQL remains a steadfastly declarative language. Apache Spark™ DataFrame API transformations draw its declarative implementation inspiration from SQL's declarative nature to declare what to do with the data and not how to do it. Yet SQL queries do get quite complex, and often developers struggle with deeply nested subqueries. As a result readability suffers.

So the simple answer is YES, we do! Let's examine why.

People love Spark DataFrame declarative API, because they allow us to to cobble together a string of logical and sequential expressions, from start to finish. Second, most queries begin with a table name, followed by a string of operations related to the data in the table.

Consider this simple example: imagine if you wanted to write a Spark DataFrame query by chaining some simple operations on data in a table, customers:

from pyspark.sql.functions import desc

result = spark.table("customers") \
    .filter(col("region") == "North") \
    .join(spark.table("orders"), col("customers.id") == col("orders.customer_id")) \
    .select("customer_name", "order_date", "amount") \
    .orderBy(desc("order_date")) \
    .limit(10)

Traditional SQL without the Pipe (|) operator would look as follows: the table name is buried after the projection and join, which is the opposite from the above Python code.

SELECT customer_name, order_date, amount 
FROM customers 
JOIN orders ON customers.id = orders.customer_id 
WHERE region = 'North' 
ORDER BY order_date DESC 
LIMIT 10

But if you use the Pipe (|) operator, the code is much simpler, readable, and matches in flow, format, and declarative in style as the DataFrame code above: specify the table, filter, projection, order, and limit--similar to the flow in our DataFrame example.

Using a the general Pipe (|) operator syntax of the form:

Recommended by LinkedIn

Why is the Pipe Syntax in the upcoming Apache Spark…

Knut Ole Thoreplass 12 months ago

Advanced SQL Engineering: Solving Real-World Problems…

Allan Andrade 10 months ago

03-Not So Basic SQL - Build your own DBMS from the…

Bahaa Mohamed 5 years ago

FROM T
| WHERE A < 100
| SELECT B + C AS D
| ....

We can translate our PySpark DataFrame code into SQL using the Pipe (|) operator-based syntax:

FROM customers 
| WHERE region = 'North' 
| JOIN orders ON customers.id = orders.customer_id 
| SELECT customer_name, order_date, amount 
| ORDER BY order_date DESC 
| LIMIT 10

Take another complex example of traditional SQL subquery:

Article content — Traditional SQL complex subquery without Pipe Operator

Now consider the same query written with the Pipe (I) operator syntax:

The Pipe(|) operator affords a few worthy merits. In a Databricks blog, SQL Gets Easier: Announcing New Pipe Syntax, Daniel Tenedorio introduces this feature in the upcoming Apache Spark 4.0 release; I summarize its key benefits:

SQL's core benefit as a declarative language focuses on "what" rather than "how."
The readability improves with Pipe (|) operator in composing complex SQL queries.
The Pipe (|) operator offers a more intuitive approach by enabling sequential or chaining transformations, similar to DataFrame API.

To read more about this new Pipe (|) operator syntax check the following resources:

Read the blog: SQL Gets Easier: Announcing New Pipe Syntax
Watch the recording: What New in Upcoming Apache Spark 4.0.0 Release
Read the original paper: SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL
Try out with upcoming Apache Spark 4.0.0 Release or Databricks Runtime 16+

To view or add a comment, sign in

Does the SQL Language Need a Pipe (|) Operator?

Jules Damji

Recommended by LinkedIn

More articles by Jules Damji

Others also viewed

No CLR, No Cry: Native Regex Rocks SQL Server 2025

Using SQL Expressions in PySpark with the expr Function

SQL - Structured Query Language

Robust LLM-Driven SQL Generation

SQL MASTERY

Quick SQL Tutorial - Essential Guide for Beginners

SQL vs Pandas: The ALTER Operation🛠️

Empowering SQL with Transparent & Performant UDFs on Databricks

Important MS SQL Questions

Explore content categories

Recommended by LinkedIn

More articles by Jules Damji

SQL Scripting in Upcoming Apache Spark 4.0

How to Process IoT Device JSON Data Using Spark Datasets and DataFrames - Part 2

Publish & Subscribe Data Pipeline with Confluent 2.0/Kafka 0.9 & InfluxDB

PubNub Integration with Apache Spark and InfluxDB: A simulation of IoT Device Connectivity

A Gopher Joins the Kingdom of Languages, with Ums, Ahs, Uhh...

Apache Spark on Hadoop: Learn, Try and Do

Others also viewed

No CLR, No Cry: Native Regex Rocks SQL Server 2025

Using SQL Expressions in PySpark with the expr Function

SQL - Structured Query Language

Robust LLM-Driven SQL Generation

SQL MASTERY

Quick SQL Tutorial - Essential Guide for Beginners

SQL vs Pandas: The ALTER Operation🛠️

Empowering SQL with Transparent & Performant UDFs on Databricks

Important MS SQL Questions

Explore content categories