Relational Database Query Optimisation
Relational databases provide data access functions that are used in developing software systems. Advanced data models for software systems can be represented using these functions. During development based on relational databases, systems might have weak performance for certain read-only queries. These queries are mainly used in advanced reports.
Although it is better to avoid having slow queries, the complexity of business problems might favour focusing on having a working solution before improving performance. Instead of completely rewriting the queries to improve performance after that, it is possible to progressively apply changes to have a better result.
To identify the changes that are required to improve query performance, the causes of slowness should be first studied. The logical execution steps for the queries should be analysed first. Database tools such as Microsoft SQL Server provide functions for determining these steps. The query execution plan feature can be used for this purpose. I also reorganise the query into multiple steps using the standard 'WITH' clause and local scope table variables. After having an understanding of the query execution steps, it is important to determine the steps that are causing the slowness. The query execution plan feature provides information about this using cost percentages. I also test the performance while eliminating steps to identify bottlenecks.
After identifying bottleneck steps that are causing slowness, restructuring and indexing can be used to improve performance. I first start with adding missing indexes. Indexes are useful for improving the performance of queries that require searching for matches. These queries include filter conditions in 'WHERE' or 'JOIN' clauses. The following example shows a simple query with a filter condition in a 'WHERE' clause:
SELECT *
FROM Employee
WHERE FirstName = 'Alice'
AND
LastName = 'Smith';
The performance of the filter in this query can be improved by adding an index based on the first and last name fields. It is important to consider how databases hit these indexes when searching. All the filter fields should be included in the index to have a good result. The fields should also be in proper order from the leftmost index elements. When sharing the same index for multiple fields, it is important to make sure that the filter fields are in the leftmost elements of the index with no intruding elements in between. The following index, for example, will not be utilised properly for the query although it includes the required fields:
Recommended by LinkedIn
CREATE INDEX ix_Employee
ON Employee(FirstName,Age,LastName);
The index will not be properly utilised because of having an element between the first and last name fields. The filter fields should be sequentially in the leftmost part of the index for utilisation. The following index will be properly used for the filter related to first and last names:
CREATE INDEX ix_Employee
ON Employee(FirstName,LastName,Age);
These keys can be useful when having filter conditions with fields related to the same table. When having complex queries, filters might span fields for multiple tables. Previous versions of database systems did not directly provide functions for having these indexes. Having indexes for such cases required a lot of work based on preparing separate tables and synchronising tables. New versions of databases provide these functions. Materialised views, for example, can be used in Microsoft SQL Server to represent data from multiple tables. The database will automatically synchronise data in these views for changes in source tables. Indexes can also be added for fields in these views. These indexes can be used to improve the performance of queries for filters related to fields in multiple tables. The queries should be modified to use the materialised view instead of the original tables in this case.
The number of materialised views might have a negative impact on the required space in a system. I prefer to use a different method for indexing fields in multiple tables. Instead of having materialised views that are automatically synchronised, I use local scope table variables to represent data from multiple tables. I then add the required indexes for the local scope tables. This will require building the index each time the query will be used; however, it is better than having a large number of materialised views. This is useful when the time for building the index is acceptable. It is also possible to have complete results for query execution steps in a local scope table variable. Indexes can then be used for these local scope tables if joining with the results of other steps is slow. The following table variable can be used to represent information about employees and departments for example:
DECLARE @EmployeeDepartment TABLE
(
EmployeeID INTEGER,
FirstName NVARCHAR(100),
LastName NVARCHAR(100),
DepartmentName NVARCHAR(100),
UNIQUE CLUSTERED (FirstName,DepartmentName,EmployeeID)
);
The index on the local scope table variable '@EmployeeDepartment' can be useful for queries with filters related to first names and department names. The table variable can be populated with information about employees and related departments using a join query. Intermediate results using these table variables can improve the performance of slow steps. The performance might sometimes improve when having intermediate results in table variables without even adding indexes. This might occur when databases generate bad execution plans for queries. The execution plans databases generate for complex queries are not always perfect. Having a sequence of steps with intermediate results will generate better results in these cases.
Performance improvement based on execution steps and indexes is usually enough for slow queries. It is not always required to restructure the queries completely after having a working solution. It is enough to apply changes that make the sequence of steps explicit using the 'WITH' clause and local scope table variables for intermediate results. Performance improvements for identified slow steps will have a satisfactory result.