In this article, I discuss the basic methods I use to analyse and optimize SQL queries. A properly written SQL with appropriate indices can turn a query that runs in hours into one that runs in seconds. While the code in this article is rooted in SQL Server, most of the concepts will apply to other databases.
We will go over:
- Analyzing Performance of Queries
- Optimizing SQL Syntax
- Query Execution Plans
- Creating Proper Indices
Analyzing Performance of Queries
SQL Server comes with a few options to analyse the performance of a query. My basic barrage of tests that I use on a query is this:
Expand|Select|Wrap|Line Numbers
- CHECKPOINT
- DBCC DROPCLEANBUFFERS
- DBCC FREEPROCCACHE
- SET STATISTICS IO ON
- SET STATISTICS TIME ON
- SET STATISTICS PROFILE ON
- GO
- -- SQL GOES HERE
- GO
- SET STATISTICS IO OFF
- SET STATISTICS TIME OFF
- SET STATISTICS PROFILE OFF
- GO
CHECKPOINT - writes any unwritten buffers.
DBCC DROPCLEANBUFFERS - clears out the buffer cache.
DBCC FREEPROCCACHE - clears out the plan cache.
After that, I turn on some statistical options.
SET STATISTICS IO - shows you information on the disk activity that occurs with a query. The lower the numbers, the faster it is.
SET STATISTICS TIME ON - shows you how long, in milliseconds, it takes a query to run.
SET STATISTICS PROFILE ON - shows you a text version of the query execution plan. This plan reveals every step the database uses to run the query and how long it takes to run that step.
Optimizing SQL Syntax
There many syntactical choices you can make that will speed up a query. I will discuss some of those choices here. The first eight are relatively self explanatory but the last three merit further discussion.
- Avoid using views.
- Use an INNER JOIN as opposed to one of the OUTER JOINs whenever possible.
- Avoid the SELECT *, return only the fields that you need.
- Don't SORT or ORDER BY unless it's needed.
- Don't JOIN to tables that you don't need for the query.
- Avoid NOT, i.e. NOT IN and NOT EXISTS.
- EXISTS is quicker than IN.
- Use UNION ALL rather than UNION if you don't need DISTINCT records.
- Sometimes a UNION is quicker than OR. To know which one is quicker, try both and look at the statistics.
- When filtering records in the WHERE clause, try to avoid calculations on fields. Move them to the constant side of the equation. If it's not possible to avoid, use an index that includes the calculation.
- Ninety percent of the time, filtering using a subquery in the WHERE clause is slower than filtering by JOINing to a subquery. To know which one is quicker, try both and look at the statistics.
In the WHERE clause, using an OR condition can sometimes slow down a query by a lot. If this happens, try a UNION or UNION ALL instead.
Expand|Select|Wrap|Line Numbers
- -- OR syntax
- SELECT someField
- FROM someTable
- WHERE otherField = 5 OR otherField = 6;
- -- UNION syntax
- SELECT someField
- FROM someTable
- WHERE otherField = 5
- UNION ALL
- SELECT someField
- FROM someTable
- WHERE otherField = 6;
When you do calculations on a field in the WHERE clause, the query can no longer just scan an index because it now must perform the calculation on every record to see if it should be returned. For example, if I had a date field and I wanted to know if it was less than seven days old from the current date, I have a couple of syntactical choices:
Expand|Select|Wrap|Line Numbers
- DATEDIFF(D, dateField, GETDATE()) <= 5
- dateField BETWEEN DATEADD(D, -5, GETDATE()) AND GETDATE()
Expand|Select|Wrap|Line Numbers
- Left(LastName, 1) = 'T'
- LastName LIKE 'T%'
Expand|Select|Wrap|Line Numbers
- SELECT *
- FROM someTable AS t
- WHERE dateField = (
- SELECT MAX(dateField)
- FROM someTable
- WHERE categoryField = t.categoryField
- );
- SELECT *
- FROM someTable AS t1
- INNER JOIN (
- SELECT categoryField, MAX(dateField) AS dateField
- FROM someTable
- GROUP BY categoryField
- ) AS t2
- ON t1.categoryField = t2.categoryField
- AND t1.dateField = t2.dateField;
Query Execution Plans
The query execution plan tells you the steps the database is going to take to run a query along with the estimated/actual time it took. If a query is running slowly, this will show which step it's getting hung up on. There are a lot of different operators that the database can use to run a query but the main culprits of a slow query are Table Scans and RID Lookups. Table scans read every row of a table to find the correct records to return. RID lookups are used when a non-clustered index is being utilized to find a matching row. It merely points to a location on a heap of data and has to go to the heap to look up the value rather than the data being stored in the index. If you see either of these, you should create the appropriate indices to get rid of them. However, there's a caveat regarding table scans, for tables with a small amount of rows, table scans are quicker than index scans.
Creating Proper Indices
Using a clustered index on a primary key field is preferable. Clustered indices significantly speed up searching on a field. A clustered index determines the physical order of data in a table. Therefore, you can only have one. However, you can have as many non-clustered indices as you want.
When creating an index, order matters. Put the fields in the following order: WHERE clause, JOIN clause, ORDER BY clause, SELECT clause. So if I wanted to optimize the following query:
Expand|Select|Wrap|Line Numbers
- SELECT t1.someField
- FROM someTable t1
- INNER JOIN otherTable t2
- ON t1.PK = t2.FK
- WHERE t1.otherField = 15
- ORDER BY t1.dateField
Incidentally, the previous example was also a fully covered query. What that means is the index contains all fields required for the query. This means that the database does not have to refer back to the data in the table to find additional data for the query. When feasible, you should create an index to fully cover a query you want to run often.
However, it is important to note that an index can be "too large". Both in terms of technical limitations and usability. The more fields and the larger the fields included in an index, the longer it takes to run a query using the index.
Conclusion
A quick note about hardware. No matter how much optimization you do, sometimes you just need pure power. If you've optimized a query as much as you can but it's still too slow, you may have to upgrade your hardware.
I hope this article has armed you with the basic tools you need to take a query that runs in hours and make it run in seconds. Remember, optimization is a very deep subject and these are just the basic tools that you can use to speed up your query. The next time you have a long running query, analyse the statistics and execution plans, create the proper indices with help from the execution plan, and test different versions of the SQL syntax.