SQL, Chapter 3: the FROM Clause and JOIN Subclause
The FROM Clause
The FROM clause defines the set of data from which the resultset will be selected. If the FROM clause is used, at least one from_item is required. The FROM clause in its simplest form is a single table.
That table may be joined to other tables to create a composite of data from multiple tables, from which data can be selected based on the values in the expression, and filtered based on the criteria specified in the WHERE clause.
The FROM clause can also contain a SELECT statement of its own, called a subquery. The results of the subquery become the dataset from which the outside SELECT statement selects its data. This process can be infinitely repeated, although performance degrades pretty quickly with multiple nested subqueries. (We’ll cover subqueries in Chapter 5.)
The JOIN Subclause
Joins are defined in the JOIN clause, which is a subclause of the FROM clause. There are several types of joins, and multiple tables can be joined with multiple JOIN clauses in a single FROM clause.
It can be helpful to express the different types of joins in terms of set theory. In terms of set theory, and in terms of two sets A and B:
In SQL, joins have this basic syntax:
SELECT column_list
FROM table_a [ [ AS ] alias ]
[join_type table_b [ [ AS ] alias ]
ON join_condition] [, ...]
Where join_type is one of:
And join_condition is an expression with a result of type boolean that specifies how a row from one table matches a row from the other.
In JOIN clauses, the terms INNER and OUTER are usually omitted in practice: JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN. For the purpose of this explanation, I’ll leave them in.
To give an example of each, I’ll use these two tables:
INNER JOIN
INNER JOIN includes only rows that have matching values as specified in the ON clause (in this case fruits.color_id and colors.id).
This SQL statement:
SELECT name, color
FROM fruits f
INNER JOIN colors c ON f.color_id = c.id;
Will have this result:
LEFT OUTER JOIN
LEFT OUTER JOIN includes all the rows that INNER JOIN does, as well as all other rows in the table referenced in the FROM clause (the “FROM table,” fruits in this case). Rows in the FROM table that do not have matching values in the JOIN table (the colors table in this case) have null values for all columns in the JOIN table.
This SQL statement:
SELECT name, color
FROM fruits f
LEFT OUTER JOIN colors c ON f.color_id = c.id;
Will have this result:
RIGHT OUTER JOIN
RIGHT OUTER JOIN includes all the rows that INNER JOIN does, as well as any other rows in the table referenced in the JOIN clause. Rows in the JOIN table that do not have matching values in the FROM table have null values for all columns in the FROM table.
This SQL statement:
SELECT name, color
FROM fruits f
RIGHT OUTER JOIN colors c ON f.color_id = c.id;
Will have this result:
FULL OUTER JOIN
FULL OUTER JOIN returns a combination of the rows returned by LEFT OUTER JOIN and RIGHT OUTER JOIN.
This SQL statement:
SELECT name, color
FROM fruits f
FULL OUTER JOIN colors c ON f.color_id = c.id;
Will have this result:
CROSS JOIN
CROSS JOIN returns the specified columns from the Cartesian product of the two tables. A Cartesian product is the set of all possible combinations of each of the elements in two sets.
A good example of when to use a cross join is when creating a deck of cards from a table of ranks and a table of suits. A deck of cards is the Cartesian product of these two tables.
Using Cross Joins to Get Missing Combinations
Cross joins can also be used to get a list of all members of a many-to-many relationship that have no relation.
For example, suppose I have some customers, and I have some products. So, I have a customers table and a products table. I also keep a list of which customers have bought which products in a customers_products table. I want a list of customers and any products that they haven’t bought yet.
In other words, I want a list of all the combinations of customers and products that aren’t in the customers_products table.
Here are some representative tables:
Now, this query:
SELECT c.name AS cust, p.name AS prod
FROM customers c
CROSS JOIN products p
WHERE NOT EXISTS (
SELECT * FROM customers_products
WHERE cust_id = c.id
AND p.id = prod_id
);
Will have this result:
Which is what we want.
I have been writing these articles on my "blogfolio" first, and then transferring them over to LinkedIn. If I haven't put the next chapter up here yet, the link to the next chapter will take you to the copy on robertrodes.com.