Optimizing MS SQL Server query performance

Motivation

One of the most common issues with SQL query performance comes from the lack of indexes or indexes defined in the wrong columns.

This is a cross platform issue, a developer issue one might say, and not really an issue with MySQL, MS SQL or any other vendor - although in one my examples below MySQL does prevent you doing a couple of mistakes and I'll explain shortly why. Still this post is about MS SQL so I'll focus on explaining the best way to track down and improve your query performance.

Indexes are not a panacea as there are other techniques for improving performance specially after all the indexes are in place, but still they are one of the first points of call and more often than not they are not defined properly or defined at all.

What are indexes and why are they needed.

I could indeed copy and paste a description from Wikipedia, here's the link, on what database indexes are, but I'll just get to the point - they are relevant to ensure that your data is accessed as efficiently and as quickly as possible. Usually many developers are "lucky" enough and get away without using them because their projects or web sites are quite small and never really grow to the point when indexes are a key to preventing your website from halting to a crawl.

Imagine that you have a table with 50 products and your query uses a column that is not indexed to look up data. As the column is not indexed, the query itself does a table scan and goes through all of the products. If you're luck you want the first one, if you're unlucky you want the last one. Now as it's only 50 products this is quite fast as the query goes through each one of them. Now imagine that your e-commerce website now has 75000 products. As you can imagine this will be much more costly and will take much longer. Remember that this is not linear so the later will not take 1500 times longer. This will suddenly become exponential and it will take longer.

Joins are also affected. If your query joins a few tables it will have to do table scans for all of them if no indexes are defined.

Why is this article focused on MS SQL Server.

When maintaining large scale projects for some of our customers we still come across missing indexes. This happens even more so on MS SQL server databases. Funnily enough because when compared to MySQL - remember when I mentioned that MySQL prevented this one certain cases?

MS SQL Server allows you to create relationships between tables without defining indexes on columns used for joins. This is one of the most common mistakes. MySQL on the other hand only allows you to add a foreign key relationship if the columns used for the relationship are both indexed.

As one is usually is a primary key this is half done, but always remember to index columns that used as part of the "where" statement in a query or part of joins. In fact any type of statement that looks up data should use columns that are indexed.

How to resolve and optimize query performance

As a rule of thumb when creating new tables, index the columns that are used for foreign relationships, joins or any type of look ups. This should be key. Yes the performance is just fine when you have 5 records in a table, but trust me it will crawl to a halt when you have hundreds if not thousands. Actually inserting or deleting records on tables with less indexes is faster as indexes do not have to be recalculated, but the offset is minimal and will soon pay off.

If you're optimizing an existing database you can use SQL Server Management Studio. Simply open a new query window, type in the query and instead of running the query click on the "Display execution plan" button.

The execution plan will come up as per the image on the left and you will see a lot of useful information. Place your mouse of the icons to bring up more details on each node. You can also see the estimated cost of each node of the execution tree. Hover over each one of the nodes and try to find descriptions that say "Scan", for instance "Clustered Index Scan". This is what you need to fix! With time you'll get used to visually find the icon that identifies a scan.

Read the information provided and identify the columns that are part of that node. Usually displayed in the output section of the further information panel. Example below:

After you're found your index or table scan's add indexes to the columns accordingly. Again the columns that are part of the scan are identified in the output section or you can easily determine the columns involved in the scan by analyzing your query. Look for any statement that does a look up. You can add the indexes in batches or one by one. Just make sure that you don't have your execution plan bringing up and table or index scans. Do bare in mind that when adding indexes the execution plan will change as will the cost of each node in the execution tree. It makes sense as if one node costs 55% of the execution plan and after you add the correct indexes it only costs 1% or 2%, the remaining percentages will change. On the safe side you can always view the execution plan every time you add an index or optimize your query,

Does this work?

Short answer: yes! Just a couple of days ago I was optimizing a large database for one of our clients and one of the queries was joining 5 tables. None of the foreign key columns had an index. After adding the indexes as necessary, the query time came down from over 4 seconds to a under a second. Not bad for an optimization that took 5 minutes to do which in turn made the page load 4 times faster than before.

So to sum up, don't forget to index your columns appropriately. Don't go overboard and index everything under the sun as that will also slow down the database, but make sure you index the relevant columns. From experience this one of most common mistakes.

Again this is database/platform agnostic and it's an issue of how relational databases work. Not an issue, it's just the way they work. And to be fair this is taught on the first day of database school - if there was one. Well not the first, but maybe the second. Each database will have their own tools with different names but I'm reasonably certain that you will be able to pin point these issues very quickly.

To view or add a comment, sign in

Others also viewed

Explore content categories