473,396 Members | 2,039 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes and contribute your articles to a community of 473,396 developers and data experts.

How To: Optimize SQL Queries

Rabbit
12,516 Expert Mod 8TB
Introduction
In this article, I discuss the basic methods I use to analyse and optimize SQL queries. A properly written SQL with appropriate indices can turn a query that runs in hours into one that runs in seconds. While the code in this article is rooted in SQL Server, most of the concepts will apply to other databases.

We will go over:
  1. Analyzing Performance of Queries
  2. Optimizing SQL Syntax
  3. Query Execution Plans
  4. Creating Proper Indices

Analyzing Performance of Queries
SQL Server comes with a few options to analyse the performance of a query. My basic barrage of tests that I use on a query is this:
Expand|Select|Wrap|Line Numbers
  1. CHECKPOINT
  2. DBCC DROPCLEANBUFFERS
  3. DBCC FREEPROCCACHE
  4.  
  5. SET STATISTICS IO ON
  6. SET STATISTICS TIME ON
  7. SET STATISTICS PROFILE ON
  8. GO
  9.  
  10. -- SQL GOES HERE
  11. GO
  12.  
  13. SET STATISTICS IO OFF
  14. SET STATISTICS TIME OFF
  15. SET STATISTICS PROFILE OFF
  16. GO
The first thing I do is to clear all the caches so that I can the SQL can run fresh without help from previous runs.

CHECKPOINT - writes any unwritten buffers.
DBCC DROPCLEANBUFFERS - clears out the buffer cache.
DBCC FREEPROCCACHE - clears out the plan cache.

After that, I turn on some statistical options.

SET STATISTICS IO - shows you information on the disk activity that occurs with a query. The lower the numbers, the faster it is.
SET STATISTICS TIME ON - shows you how long, in milliseconds, it takes a query to run.
SET STATISTICS PROFILE ON - shows you a text version of the query execution plan. This plan reveals every step the database uses to run the query and how long it takes to run that step.

Optimizing SQL Syntax
There many syntactical choices you can make that will speed up a query. I will discuss some of those choices here. The first eight are relatively self explanatory but the last three merit further discussion.
  • Avoid using views.
  • Use an INNER JOIN as opposed to one of the OUTER JOINs whenever possible.
  • Avoid the SELECT *, return only the fields that you need.
  • Don't SORT or ORDER BY unless it's needed.
  • Don't JOIN to tables that you don't need for the query.
  • Avoid NOT, i.e. NOT IN and NOT EXISTS.
  • EXISTS is quicker than IN.
  • Use UNION ALL rather than UNION if you don't need DISTINCT records.
  • Sometimes a UNION is quicker than OR. To know which one is quicker, try both and look at the statistics.
  • When filtering records in the WHERE clause, try to avoid calculations on fields. Move them to the constant side of the equation. If it's not possible to avoid, use an index that includes the calculation.
  • Ninety percent of the time, filtering using a subquery in the WHERE clause is slower than filtering by JOINing to a subquery. To know which one is quicker, try both and look at the statistics.

In the WHERE clause, using an OR condition can sometimes slow down a query by a lot. If this happens, try a UNION or UNION ALL instead.
Expand|Select|Wrap|Line Numbers
  1. -- OR syntax
  2. SELECT someField
  3. FROM someTable
  4. WHERE otherField = 5 OR otherField = 6;
  5.  
  6. -- UNION syntax
  7. SELECT someField
  8. FROM someTable
  9. WHERE otherField = 5
  10.  
  11. UNION ALL
  12.  
  13. SELECT someField
  14. FROM someTable
  15. WHERE otherField = 6;
  16.  
It makes for much longer code but the time savings could be huge. I once had a query that took hours using the OR syntax but under 5 seconds using the UNION. Test both and look at the statistics to see which would be faster in your case.

When you do calculations on a field in the WHERE clause, the query can no longer just scan an index because it now must perform the calculation on every record to see if it should be returned. For example, if I had a date field and I wanted to know if it was less than seven days old from the current date, I have a couple of syntactical choices:
Expand|Select|Wrap|Line Numbers
  1. DATEDIFF(D, dateField, GETDATE()) <= 5
  2.  
  3. dateField BETWEEN DATEADD(D, -5, GETDATE()) AND GETDATE()
The first is slower because an index on dateField can't be used since it must calculate for every record. And in a similar vein but with drastically different syntax:
Expand|Select|Wrap|Line Numbers
  1. Left(LastName, 1) = 'T'
  2.  
  3. LastName LIKE 'T%'
More often than not, if you need to filter records based on another table or the same table, using a subquery in a JOIN is quicker than using a subquery in the WHERE clause. For example, if I wanted all the fields for a record but only wanted the most current record based on a category and date field, I could do it one of two ways:
Expand|Select|Wrap|Line Numbers
  1. SELECT *
  2. FROM someTable AS t
  3. WHERE dateField = (
  4.    SELECT MAX(dateField)
  5.    FROM someTable
  6.    WHERE categoryField = t.categoryField
  7. );
  8.  
  9. SELECT *
  10. FROM someTable AS t1
  11. INNER JOIN (
  12.    SELECT categoryField, MAX(dateField) AS dateField
  13.    FROM someTable
  14.    GROUP BY categoryField
  15. ) AS t2
  16. ON t1.categoryField = t2.categoryField
  17.    AND t1.dateField = t2.dateField;
Test both and look at the statistics to see which would be faster in your case. I've had situations where a query is much faster using the WHERE clause and situations where a query is much faster in the JOIN. But the JOIN approach is correct most of the time.

Query Execution Plans
The query execution plan tells you the steps the database is going to take to run a query along with the estimated/actual time it took. If a query is running slowly, this will show which step it's getting hung up on. There are a lot of different operators that the database can use to run a query but the main culprits of a slow query are Table Scans and RID Lookups. Table scans read every row of a table to find the correct records to return. RID lookups are used when a non-clustered index is being utilized to find a matching row. It merely points to a location on a heap of data and has to go to the heap to look up the value rather than the data being stored in the index. If you see either of these, you should create the appropriate indices to get rid of them. However, there's a caveat regarding table scans, for tables with a small amount of rows, table scans are quicker than index scans.

Creating Proper Indices
Using a clustered index on a primary key field is preferable. Clustered indices significantly speed up searching on a field. A clustered index determines the physical order of data in a table. Therefore, you can only have one. However, you can have as many non-clustered indices as you want.

When creating an index, order matters. Put the fields in the following order: WHERE clause, JOIN clause, ORDER BY clause, SELECT clause. So if I wanted to optimize the following query:
Expand|Select|Wrap|Line Numbers
  1. SELECT t1.someField
  2. FROM someTable t1
  3. INNER JOIN otherTable t2
  4. ON t1.PK = t2.FK
  5. WHERE t1.otherField = 15
  6. ORDER BY t1.dateField
Then I would create an index on someTable with the fields in the following order: otherField, PK, dateField, someField. And an index on otherTable on the field FK.

Incidentally, the previous example was also a fully covered query. What that means is the index contains all fields required for the query. This means that the database does not have to refer back to the data in the table to find additional data for the query. When feasible, you should create an index to fully cover a query you want to run often.

However, it is important to note that an index can be "too large". Both in terms of technical limitations and usability. The more fields and the larger the fields included in an index, the longer it takes to run a query using the index.

Conclusion
A quick note about hardware. No matter how much optimization you do, sometimes you just need pure power. If you've optimized a query as much as you can but it's still too slow, you may have to upgrade your hardware.

I hope this article has armed you with the basic tools you need to take a query that runs in hours and make it run in seconds. Remember, optimization is a very deep subject and these are just the basic tools that you can use to speed up your query. The next time you have a long running query, analyse the statistics and execution plans, create the proper indices with help from the execution plan, and test different versions of the SQL syntax.
Mar 12 '12 #1
7 12956
r035198x
13,262 8TB
This is very useful. Good work.
Mar 13 '12 #2
ck9663
2,878 Expert 2GB
Rabbit,

Very nicely done and a very helpful article. I have some other ideas and will keep adding them on this post as I remember or found them.

For now here are some of them.

These are all based on my experience and could be different from other cases.

If OR or AND is necessary
If you you really need these logical operators, arrange your conditions properly.

If you are using AND, pick the most probably False condition first. On the other hand, if you are using OR, pick the most probable True condition first.

An AND operator will return False if any of the condition is false. While an OR operator will return True if any of the condition is True.

There is actually no guarantee that SQL Server will not perform the rest of the conditions, but in some cases it stops checking the other tests when the condition is or not satisfied.

Arrange the conditions in CASE WHEN
In a CASE WHEN statement/function, place the most probable TRUE condition first so the command does not have to parse through the entire statement only to find the most use condition at the bottom. This might be a small improvement if it's a simple condition on the WHEN clause. But if you have a lot of WHEN clauses or just a few but the condition on those WHENs requires complex calculation or even referring to columns on other tables, you will experience improvement if conditions are properly arranged.

Index Seek vs Index Scan
Remember that not because you have an index on your table, SQL Server will always use it. SQL Server will figure out for itself if it's better to use Index Seek or Index Scan. Because there are times that statistics are not updated in the database, the server will use of Index Scan instead of Index Seek.

Use the statistics and Execution Plan as describe by Rabbit above. As much as possible, try to eliminate Index Scan.

Use CURSORs and RBARs as the last option
RBAR (Row By Agonizing Row) is a process where the query is forcing the server to process the data by row and not by result set. Much like how a CURSOR is processed. Although these technique have their own use, you must, as much as possible, use a result based technique than a Row by Row process. If you have to, use temporary or permanent tables.

A table must have at least one index
Although it's not required, this mindset will help you make sure that all tables have an index before you use them. And if a table will only have one index, make it a CLUSTERED. Of course you still have to check how the table is being used and/or updated.

More to come, work for now...
Mar 13 '12 #3
Rabbit
12,516 Expert Mod 8TB
Thanks for those additional ideas ck. We could make this thread the go to place for optimizing queries.

If any one has additional optimizations, please feel free to add your two cents.
Mar 13 '12 #4
Rabbit, Great article. Anyone have any advice on how to improve performance on a non-deterministic computed column? I have some of those, and they usually are the most costly operation in the execution plan.
Mar 13 '12 #5
ck9663
2,878 Expert 2GB
Try to see if you can execute the function at run time and see how the execution plan looks. Depending on how you use the function, you might be able to refactor and restructure your query to utilize Index Seek during execution instead of Index Scan.

Most of the time a non-deterministic function uses an Index Scan and not Index Seek.


Good Luck!!!


~~ CK
Mar 14 '12 #6
Rabbit
12,516 Expert Mod 8TB
My first piece of advice would be to not use it. If we know what the calculation is doing, we may be able to substitute it with something else. If there's no way around it, make sure it's persistent and put it in an index.
Mar 14 '12 #7
mafaisal
142 100+
Very Good Article.
Thanks Rabbit, CK

By Faisal
Oct 20 '13 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: jy2003 | last post by:
I have read a book, which suggests we should change OR to UNION for better performance, but currently I have too many OR clauses(I have a query with 100 ORs) and it does not sound good to have 100...
3
by: Paul Janssen | last post by:
Hello! Can anyone help me out with the following situation: (a) a single query with 550 id's in the IN-clause resulting into 800+ seconds; (b) 550 queries with a single id in the IN-clause...
1
by: Bruce MacDonald | last post by:
I've got a question/request for the SQL gurus. I'm building a model of bandwidth demand in MS Access and want to get aggregated results for demand at each PCP in each time period. The two...
5
by: PeteCresswell | last post by:
Access tends to take over my machine when I run long queries. The queries in question are Append queries. Some of the jobs we're talking about run 45 minutes to an hour...so the loss in...
5
by: Martin Lacoste | last post by:
There's likely not a simple answer for this, I know, but, I thought I'd try anyways... Background.. I've been racking my brain with some queries that I thought were straightforward, but have...
5
by: mathieu.page | last post by:
Hi, I often have recursives queries in my applications, like in this simplified example : req1 : SELECT EmployeNo, EmployeName, EmployePhone FROM Employe; req2 :
13
by: Frank Swarbrick | last post by:
IBM has a product for the VSE operating system called the VSAM Redirector. It allows you to use VSAM to access RDBMS tables/views as if they were actual VSAM files. We're doing a comparison right...
2
ADezii
by: ADezii | last post by:
Create Indexes on all Columns used in 'ad hoc' Query Joins, restrictions, and sorts (Jet already creates Indexes for Enforced Relationships). Use Primary Keys instead of Unique Indexes wherever...
7
Denburt
by: Denburt | last post by:
According to our lovely friends at Microsoft they say that after compacting a database the queries should be recompiled by opening each one then closing it. So I compacted the database I was...
9
by: jehugaleahsa | last post by:
Hello: I am writing a cute little class that will cache queries against a database. Currently, I am implementing this by storing the command text, parameter values and generated DataRows. ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.