Find elements in a database table that also exist in another table

Find elements in a database table that also exist in another table

There are different ways to frame this question. In terms of set theory, we are interested in the intersection of 2 sets, i.e., find all members of a set that are also members of another set. In a relational database, this becomes "find all the keys in one table that are also present in another table." There are many business use cases for this kind of question. Two examples are:

  • Identify all of our customers that have placed at least one order
  • A company that provides online education may want to identify every website registrant who has signed up for at least one course

Below are four ways to do this in SQL with the associated commentary on why to choose one over the other.

For the company that provides online education, we'll start with 2 tables: a customers table and a customers_courses table. The customers_courses table contains 1 row for each customer who signs up for a particular course. So, if someone signs up for 3 courses, there will be three records in the table and if someone has registered on the website and not signed up for any course, there will be zero records in the customers_courses table.

The customers table schema:

CREATE TABLE `customers` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `fName` varchar(45) DEFAULT NULL,
  `lName` varchar(45) DEFAULT NULL,
  `UserID` int NOT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `UserID_UNIQUE` (`UserID`)
)        

And, the customers_courses schema:

CREATE TABLE `customers_courses` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `Customer_ID` int NOT NULL,
  `Course_ID` int NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `customers_Customer_ID` (`Customer_ID`),
  CONSTRAINT `Customer_ID` FOREIGN KEY (`Customer_ID`) REFERENCES `customers` (`ID`)
)        

Next, we look at the four ways to find all customers who have signed up for at least one course.

1) Clearly, we need a JOIN in the SQL query since customer information is in one table and the association with courses is in another. Most SQL developers I've worked with instinctively use a LEFT JOIN. If we do that, we will have a query that has a left join and then a WHERE clause to filter out customers who haven't signed up for any course. Also, without the DISTINCT, we would get 1 record for each customer-course combination whereas we want only one reference to a customer. The use of DISTINCT gives us the desired result but is inefficient since the query gets all records and then removes duplicates!

select distinct customers.userid, customers.fName
from dbo.customers
left join dbo.customers_courses
on customers_courses.Customer_ID = customers.ID
where customers_courses.id is not null
;        

For those interested in learning how the SQL engine will execute the query, run it after adding EXPLAIN ANALYZE before the SELECT. EXPLAIN works for all SQL implementations and ANALYZE is a sub-variant for MySQL; other implementations have their own version of EXPLAIN - it provides more details about the execution plan.

2) We can improve on the LEFT JOIN with the WHERE by using a JOIN - but we still need the inefficient DISTINCT.

select distinct customers.userid, customers.fName
from dbo.customers
join dbo.customers_courses
on customers_courses.Customer_ID = customers.ID
;        

3) It's more effective to let the SQL engine remove the duplicates for us (rather than asking for all the result rows and then distilling with DISTINCT). The IN clause is one option.

select customers.userid, customers.fName
from dbo.customers
where customers.id in (
	select customer_id from dbo.customers_courses)
;        

My concern with IN is that it limits us to an equality test on one field - and one field only.

4) The last option is the WHERE EXISTS clause. I like it the best because

  • it comes closest to the business requirement (find all customers with an existing course) and
  • it provides flexibility on how we define SELECT in the exists clause. We can go beyond an equality test on just one field. It's completely OK to use SELECT * - I rely on the SQL engine to optimize the execution and not actually fetch all the columns for an EXISTS test and also not fetch all records from the customer_courses table for each customer.

select
	customers.userid, customers.fName
from dbo.customers
where exists (
	select * from dbo.customers_courses
	where customers_courses.customer_id = customers.id
    )
;         

Thanks for sharing your insights and tips!

Like
Reply

To view or add a comment, sign in

More articles by Tushar Mehta

  • MS Excel: How to develop a recursive function

    Introduction A recursive function returns a value that depends on the value that the function returns for another set…

    2 Comments
  • Phone Number Region Lookup

    Introduction We call people using phone numbers without giving much thought to what the number represents. Each number…

    1 Comment
  • MS Excel - Graph a Spill Range

    Introduction to Dynamic Array Formulas and Spill Ranges Microsoft implemented a major improvement in Excel with the…

  • List and count unique values (MS Excel, Python, and SQL)

    This is a fairly common requirement that until recently had a convoluted solution in Excel. Given a list of values, how…

    1 Comment
  • Excel Large Number Arithmetic with Python

    Introduction When it comes to numbers, MS Excel follows the IEEE 754 standard (Floating-point arithmetic may give…

  • MS Excel: Recursive functions

    In MS Excel: Functions are first-class objects, we saw how a custom Excel function, using LAMBDA, can call another…

    3 Comments
  • MS Excel: Functions are first-class objects

    What makes a function a first-class object? This is almost certainly an advanced discussion. And, we will start with a…

  • Generate unique random integers

    Generating unique random integers is an important task in various domains. Examples include data sampling, simulations,…

    2 Comments
  • MS Excel: Goal Seek or Algebra

    Let's start with a question. I want to pay a business through a payment processor such as PayPal.

    1 Comment
  • Calendar for the New Year

    For the New Year..

    1 Comment

Others also viewed

Explore content categories