Find elements in a database table that also exist in another table

Tushar Mehta

Published Nov 3, 2022

There are different ways to frame this question. In terms of set theory, we are interested in the intersection of 2 sets, i.e., find all members of a set that are also members of another set. In a relational database, this becomes "find all the keys in one table that are also present in another table." There are many business use cases for this kind of question. Two examples are:

Identify all of our customers that have placed at least one order
A company that provides online education may want to identify every website registrant who has signed up for at least one course

Below are four ways to do this in SQL with the associated commentary on why to choose one over the other.

For the company that provides online education, we'll start with 2 tables: a customers table and a customers_courses table. The customers_courses table contains 1 row for each customer who signs up for a particular course. So, if someone signs up for 3 courses, there will be three records in the table and if someone has registered on the website and not signed up for any course, there will be zero records in the customers_courses table.

The customers table schema:

CREATE TABLE `customers` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `fName` varchar(45) DEFAULT NULL,
  `lName` varchar(45) DEFAULT NULL,
  `UserID` int NOT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `UserID_UNIQUE` (`UserID`)
)

And, the customers_courses schema:

CREATE TABLE `customers_courses` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `Customer_ID` int NOT NULL,
  `Course_ID` int NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `customers_Customer_ID` (`Customer_ID`),
  CONSTRAINT `Customer_ID` FOREIGN KEY (`Customer_ID`) REFERENCES `customers` (`ID`)
)

Next, we look at the four ways to find all customers who have signed up for at least one course.

1) Clearly, we need a JOIN in the SQL query since customer information is in one table and the association with courses is in another. Most SQL developers I've worked with instinctively use a LEFT JOIN. If we do that, we will have a query that has a left join and then a WHERE clause to filter out customers who haven't signed up for any course. Also, without the DISTINCT, we would get 1 record for each customer-course combination whereas we want only one reference to a customer. The use of DISTINCT gives us the desired result but is inefficient since the query gets all records and then removes duplicates!

Recommended by LinkedIn

Fix "The permission 'KILL DATABASE CONNECTION' is not…

Eduardo Silva Rossi 1 year ago

SQL Server - Multiple Granularities of Time In One…

Evelyn Livant 1 year ago

Boost SQL Efficiency: Avoid TRUNC in Date Filters and…

Naresh Reddy Regalla 7 months ago

select distinct customers.userid, customers.fName
from dbo.customers
left join dbo.customers_courses
on customers_courses.Customer_ID = customers.ID
where customers_courses.id is not null
;

For those interested in learning how the SQL engine will execute the query, run it after adding EXPLAIN ANALYZE before the SELECT. EXPLAIN works for all SQL implementations and ANALYZE is a sub-variant for MySQL; other implementations have their own version of EXPLAIN - it provides more details about the execution plan.

2) We can improve on the LEFT JOIN with the WHERE by using a JOIN - but we still need the inefficient DISTINCT.

select distinct customers.userid, customers.fName
from dbo.customers
join dbo.customers_courses
on customers_courses.Customer_ID = customers.ID
;

3) It's more effective to let the SQL engine remove the duplicates for us (rather than asking for all the result rows and then distilling with DISTINCT). The IN clause is one option.

select customers.userid, customers.fName
from dbo.customers
where customers.id in (
	select customer_id from dbo.customers_courses)
;

My concern with IN is that it limits us to an equality test on one field - and one field only.

4) The last option is the WHERE EXISTS clause. I like it the best because

it comes closest to the business requirement (find all customers with an existing course) and
it provides flexibility on how we define SELECT in the exists clause. We can go beyond an equality test on just one field. It's completely OK to use SELECT * - I rely on the SQL engine to optimize the execution and not actually fetch all the columns for an EXISTS test and also not fetch all records from the customer_courses table for each customer.

select
	customers.userid, customers.fName
from dbo.customers
where exists (
	select * from dbo.customers_courses
	where customers_courses.customer_id = customers.id
    )
;

Julie Basu, PhD 3y

Thanks for sharing your insights and tips!

To view or add a comment, sign in

Find elements in a database table that also exist in another table

Tushar Mehta

Recommended by LinkedIn

More articles by Tushar Mehta

Others also viewed

Brent Ozar Course, SQL Diagnostic Lab, and a Guide to Choosing the Right DBMS

How to list all tables in a database using a query

Why there are no Arrays in SQL?

SQL Friday newsletter #5

POSTGRESQL LEFT JOIN

SQL Cursors are now smarter, thanks to variables

🚀 Elevate Your SQL Queries: Avoid Functions on Indexed Columns 🚀

SQL's Basic Operations

Create a Job in SQL SERVER to purge your database

Is SELECT * bad for performance?

Explore content categories

Recommended by LinkedIn

More articles by Tushar Mehta

MS Excel: How to develop a recursive function

Phone Number Region Lookup

MS Excel - Graph a Spill Range

List and count unique values (MS Excel, Python, and SQL)

Excel Large Number Arithmetic with Python

MS Excel: Recursive functions

MS Excel: Functions are first-class objects

Generate unique random integers

MS Excel: Goal Seek or Algebra

Calendar for the New Year

Others also viewed

Brent Ozar Course, SQL Diagnostic Lab, and a Guide to Choosing the Right DBMS

How to list all tables in a database using a query

Why there are no Arrays in SQL?

SQL Friday newsletter #5

POSTGRESQL LEFT JOIN

SQL Cursors are now smarter, thanks to variables

🚀 Elevate Your SQL Queries: Avoid Functions on Indexed Columns 🚀

SQL's Basic Operations

Create a Job in SQL SERVER to purge your database

Is SELECT * bad for performance?

Explore content categories