Exists vs IN vs JOIN

Bogdan Feodot

Published Apr 22, 2021

I was wondering what is better EXISTS, IN or JOIN?

Code for the demo:

We will see which operator performs best and when. Let's do some testing!

First, we create a big table and a small table, insert some random data into it and create clustered index for both tables:

CREATE TABLE BigTable (ID INT NOT NULL, 
						FirstName VARCHAR(100), 
						LastName VARCHAR(100), 
						City VARCHAR(100))
GO
-- Insert One Hundred Thousand Records
INSERT INTO BigTable (ID,FirstName,LastName,City)
SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 
					'Bobby', 
		CASE WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Howard' 
					ELSE 'Evans' END,
		CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' 
			 WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' 
			 WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' 
			 WHEN ROW_NUMBER() OVER (ORDER BY a.name)%427 = 1 THEN 'Houston'
					ELSE 'New Mexico' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO

-- Create Indexes 
-- Create Clustered Index
CREATE CLUSTERED INDEX IX_BigTable_ID
ON BigTable(ID)
GO

-- Create Table
CREATE TABLE SmallTable (ID INT NOT NULL, 
						FirstName VARCHAR(100), 
						LastName VARCHAR(100), 
						City VARCHAR(100))
GO
-- Insert One Hundred Thousand Records
INSERT INTO SmallTable (ID,FirstName,LastName,City)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 
					'Bobby', 
		CASE WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Howard' 
					ELSE 'Evans' END,
		CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' 
			 WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' 
			 WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' 
			 WHEN ROW_NUMBER() OVER (ORDER BY a.name)%427 = 1 THEN 'Houston'
					ELSE 'New Mexico' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Create Indexes 
-- Create Clustered Index
CREATE CLUSTERED INDEX IX_SmallTable_ID
ON SmallTable(ID)

GO

Now let’s see what is better in terms of performance:

-- IN Clause
SELECT ID, City
FROM BigTable 
WHERE ID IN (SELECT ID FROM SmallTable)
GO
-- Exists Clause
SELECT ID, City
FROM BigTable 
WHERE EXISTS (SELECT ID FROM SmallTable WHERE SmallTable.ID = BigTable.ID)
GO
-- Using JOIN
SELECT bt.ID, bt.City
FROM BigTable bt
INNER JOIN SmallTable st ON bt.ID = st.ID

GO

Looking at the execution plan we can see there is no difference!

What if we check if the IDs from the big table and not in the IDs of the small table?!

Here we have some interesting results:

We can see that NOT IN and NOT EXISTS is better than left join.

Setting statistics IO on will give us the answer to why this is happening.

SET STATISTICS IO ON

When using left join, a worktable is created in order to satisfy the HASH MATCH operator used in the execution plan.

But that’s not all! The eagle-eyed of you might have noticed that I have created intentionally created the ID columns with the NOT NULL constraint.

What if we change the column definition and we make the column NULLABLE?

WOW, That was unexpected! Let’s also look at the IO!

200.000 logical reads when using NOT IN! That is 400 times more IO used when you use NOT IN on nullable columns!

I will certainly be paying attention to this in the future!

Hope you enjoyed this one, let me know if this was helpful for you, and were you right?

Dragos Alexandru Burciu

Always growing :)

Super Bogdan! Bravo, personal imi plac mult postarile tale de genul asta, very insightfull!

Exists vs IN vs JOIN

Bogdan Feodot

More articles by Bogdan Feodot

Others also viewed

R procs Package Release v1.0.8

MAX_BY(): The Function You Didn't Know You Needed (BigQuery)

How to handle connection Errors in Qlik by Omar BEN SALEM.

10 differences between a PowerPivot Pivot Table and a Standard one

R + Shiny + Fama French: Part 2

Excel LAMBDA Spotlight: Bisected Map with BMAPλ

Fixing Stacked Charts with Tiny Slices

Quick Qlik Trick 7 - Clean Up Useless Variables

Pivot Tables Can't Refresh on Their Own!

Get Random Number Of Unique Items From A Range Or An Array

Explore content categories