How is data stored in SQL databases?
Today we are going to take a look at how a SQL database stores the data, this understanding can be really helpful in analyzing database queries.
At a logical level, the data is stored in the form of rows and columns but at the physical level, it is stored in the form of data pages which are often of defined size. In a SQL server by default, the pages are 8KB and stored in a tree-like structure called B-Tree or Indexed B-Tree or Clustered index structure.
The nodes at the bottom of the B-Tree are called the leaf nodes storing the actual table data. Let's assume we have a Student table indexed at the id column and for the sake of simplicity consider 100 students' data sum up for a data page of 8 KB. The Student table data indexed at the id column will be stored in the database as below.
Recommended by LinkedIn
The node at the top of the tree is called the root node and the nodes between the root node and the leaf nodes are called the intermediate levels. The number of intermediate levels depends upon the number of data rows in the underlying table. The root and the intermediate nodes contain the index nodes implying it will either contain the pointer to an intermediate-level page or a data row in the leaf node. This tree-like structure helps the database engine retrieve the data quickly.
The above diagram also shows the path followed by the database to retrieve data when queried upon the indexed column for a specific student id. But how about when we run a query on the non-indexed columns the engine will have to search all the data rows which can be very inefficient. Here non-clustered indexes come to play.
Unlike clustered indexes, in non-clustered indexes, we have key values at the root and intermediate levels and row locators at the bottom level. When queried on a non-clustered column both non-clustered and clustered index work together to output the result efficiently.