Exists vs IN vs JOIN
I was wondering what is better EXISTS, IN or JOIN?
Code for the demo:
We will see which operator performs best and when. Let's do some testing!
First, we create a big table and a small table, insert some random data into it and create clustered index for both tables:
CREATE TABLE BigTable (ID INT NOT NULL, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Insert One Hundred Thousand Records INSERT INTO BigTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bobby', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Howard' ELSE 'Evans' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%427 = 1 THEN 'Houston' ELSE 'New Mexico' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Create Indexes -- Create Clustered Index CREATE CLUSTERED INDEX IX_BigTable_ID ON BigTable(ID) GO -- Create Table CREATE TABLE SmallTable (ID INT NOT NULL, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Insert One Hundred Thousand Records INSERT INTO SmallTable (ID,FirstName,LastName,City) SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bobby', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Howard' ELSE 'Evans' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%427 = 1 THEN 'Houston' ELSE 'New Mexico' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Create Indexes -- Create Clustered Index CREATE CLUSTERED INDEX IX_SmallTable_ID ON SmallTable(ID)
GO
Now let’s see what is better in terms of performance:
-- IN Clause SELECT ID, City FROM BigTable WHERE ID IN (SELECT ID FROM SmallTable) GO -- Exists Clause SELECT ID, City FROM BigTable WHERE EXISTS (SELECT ID FROM SmallTable WHERE SmallTable.ID = BigTable.ID) GO -- Using JOIN SELECT bt.ID, bt.City FROM BigTable bt INNER JOIN SmallTable st ON bt.ID = st.ID
GO
Looking at the execution plan we can see there is no difference!
What if we check if the IDs from the big table and not in the IDs of the small table?!
Here we have some interesting results:
We can see that NOT IN and NOT EXISTS is better than left join.
Setting statistics IO on will give us the answer to why this is happening.
SET STATISTICS IO ON
When using left join, a worktable is created in order to satisfy the HASH MATCH operator used in the execution plan.
But that’s not all! The eagle-eyed of you might have noticed that I have created intentionally created the ID columns with the NOT NULL constraint.
What if we change the column definition and we make the column NULLABLE?
WOW, That was unexpected! Let’s also look at the IO!
200.000 logical reads when using NOT IN! That is 400 times more IO used when you use NOT IN on nullable columns!
I will certainly be paying attention to this in the future!
Hope you enjoyed this one, let me know if this was helpful for you, and were you right?
Always growing :)
5ySuper Bogdan! Bravo, personal imi plac mult postarile tale de genul asta, very insightfull!