A Use-Case for Self-referential Structures and Recursion in RDBMSs
The Elegance of Recursion
Recursion gets a bad rap, but it's perfect for certain problems. Let me show you one. It’s a technique that allows us to solve complex problems by breaking them into smaller, self-similar pieces. In functional programming, recursion isn’t just a tool -- it’s a way of thinking. At the heart of this paradigm lies the linked list, a recursive data structure that perfectly encapsulates the essence of functional design. But what makes linked lists so special, and how can their recursive nature be applied to real-world problems, like modeling an organizational hierarchy in a database? We’ll explore the beauty of linked lists, debunk myths about recursion’s performance, and demonstrate how to leverage recursive structures in SQL to represent something practical.
The Linked List: The Poster Child of Functional Programming
In functional programming, recursive structures are ubiquitous. If I were to crown one data structure as the quintessential representation of functional programming, it would undoubtedly be the linked list. Why? Because its recursive nature aligns seamlessly with the principles of immutability, composition, and simplicity that are fundamental to functional programming.
Defining a Linked List
To understand the linked list, let me try to visualize its definition. In pseudocode, a linked list can be defined as follows:
type LinkedList =
State1 => (Value, LinkedList)
or State2 => End
This definition describes a linked list as a type with two possible states:
In essence, a linked list is either a node holding a value and pointing to another linked list or a marker indicating the list’s end. This recursive definition allows the structure to grow indefinitely while maintaining a simple, elegant design.
For example, consider how we might instantiate a linked list with three values:
let x = State1(1, State1(2, State1(3, State2)))
This creates a linked list containing the values 1, 2, and 3. Visually, it can be represented as:
State1(1) -> State1(2) -> State1(3) -> State2
Or, in a more detailed breakdown:
First Element: State1(1, pointer)
|
v
Second Element: State1(2, pointer)
|
v
Third Element: State1(3, pointer)
|
v
End: State2
From Theory to Practice
Now that we’ve established what a linked list is, I'd like to address a common misconception about recursion before exploring its practical application. Many developers shy away from recursion, believing that it's inherently slow or resource-intensive.
Is Recursion Really Slow?
A common criticism of recursion is that it’s slower and more memory-intensive than iterative approaches. This perception isn’t entirely baseless, but it’s not inherently true either.
In functional programming, recursion is often implemented using tail recursion. This is a technique that can be employed when the recursive call is the last operation in the function. When a compiler or runtime supports tail-call optimization (TCO), it can transform a tail-recursive function into assembly code that’s equivalent to a loop. This means:
Unfortunately, not all programming languages or runtimes implement TCO. Popular languages like Python or JavaScript (in some environments) lack this optimization, leading to the perception that recursion is slow. This is an implementation detail, not a fundamental flaw of recursion itself. In languages like Haskell, Scheme, or certain JavaScript runtimes (e.g., Node.js with specific configurations), tail-recursive functions can perform just as well as their iterative counterparts.
Because of these tooling limitations, many developers avoid recursion, missing out on its elegance and clarity. This creates a blind spot in design pedagogy, especially for problems where recursion is not just a solution but the most natural solution. So, what’s a real-world problem where recursion shines?
Modeling an organizational hierarchy!
Recommended by LinkedIn
A Real-World Use Case: Modeling an Org Chart
Imagine you’re tasked with representing a company’s organizational structure in a database. Most organizations follow a hierarchical model: employees report to managers, who report to higher-level managers, and so on, until you reach the CEO or company head. This hierarchy resembles a chain:
Employee -> Manager -> Manager’s Manager -> CEO
Does this look familiar? It’s strikingly similar to our linked list:
State1(value) -> State1(value) -> State1(value) -> State2
This similarity suggests that a recursive data structure, like a linked list, is a natural fit for modeling an org chart. By representing each employee as a node and their manager as the next node in the chain, we can traverse the hierarchy recursively to query relationships, such as “Who is my boss’s boss?” or “Who reports to this manager?”
Bringing Recursion to the Database
Now that we’ve identified the org chart as a recursive problem, how do we implement it in a database? SQL, the language of relational databases, might not seem like an obvious candidate for recursion, but modern SQL standards support recursive queries through Common Table Expressions (CTEs).
Implementing the Org Chart in SQL
To represent an organizational hierarchy, we need a table that captures the recursive relationship between employees and their managers. Here’s a simplified schema in SQL-like pseudocode:
CREATE TABLE Employee (
id INT PRIMARY KEY,
name TEXT,
manager_id INT REFERENCES Employee(id)
);
This table has three columns:
Let’s populate the table with a small example:
INSERT INTO Employee (id, name, manager_id) VALUES
(1, 'John', NULL), -- CEO, no manager
(2, 'Jack', 1), -- Reports to John
(3, 'Jake', 2); -- Reports to Jack
This data represents a simple hierarchy: Jake reports to Jack, who reports to John, the CEO. Visually, it’s:
Jake -> Jack -> John (CEO)
To query the entire hierarchy, we can use a recursive CTE to traverse the chain from the top (the CEO, where manager_id is NULL) down to the lowest-level employee:
WITH RECURSIVE org AS (
-- Base case: Start with the employee who has no manager (CEO)
SELECT id, name, manager_id
FROM Employee
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: Join employees with their managers
SELECT e.id, e.name, e.manager_id
FROM Employee e
INNER JOIN org o ON e.manager_id = o.id
)
SELECT id, name, manager_id
FROM org;
This query works as follows:
Running this query on our example data would produce:
id name manager_id
1 John NULL
2 Jack 1
3 Jake 2
This result mirrors the linked list structure, with each row representing a node and the manager_id acting as the pointer to the next node (or NULL for the end).
Our simple example assumes a single reporting chain, but real organizations are often more complex, with multiple levels and branches (e.g., teams, departments, locations). How does our recursive model scale to these scenarios? Unfortunately, that's a question for another article. You'll just have to tune in next time to find out!
OK. Shoot me. This is just inaccurate and wrong. I can't even start to list the issues. And from someone who has made an art form out of linked lists.
Self referential data structures are natural and powerful - and are a much more natural fit for things that devs use arrays for. Recursive programming and self-referential data structures do not belong in an RDBMS. Render to Caesar what is Caesars - not this.
How many RDBMS’s I’ve seen built out to normalization that end up needing a tree after having built static categorization? Pretty much all of them that became mature systems
I'd never allow it in production. It's far too easy to miss the base case. If you're writing functional code in some languages, then you have no choice, but that's another reason to not write functional code in a language that forces you to do this. *All* forms of recursion can be expressed with an explicit stack, unrolled in a loop. Which both disallows the "messed up base case" AND allows you to more easily tune the memory usage/stack depth limits.