473,396 Members | 2,039 Online

Home Posts Topics Members FAQ

home > topics > how to: optimize sql queries

Join Bytes and contribute your articles to a community of 473,396 developers and data experts.

How To: Optimize SQL Queries

12,516 Expert Mod 8TB

Introduction
In this article, I discuss the basic methods I use to analyse and optimize SQL queries. A properly written SQL with appropriate indices can turn a query that runs in hours into one that runs in seconds. While the code in this article is rooted in SQL Server, most of the concepts will apply to other databases.

We will go over:

Analyzing Performance of Queries
Optimizing SQL Syntax
Query Execution Plans
Creating Proper Indices

Analyzing Performance of Queries
SQL Server comes with a few options to analyse the performance of a query. My basic barrage of tests that I use on a query is this:

Expand|Select|Wrap|Line Numbers

 CHECKPOINT

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE
 
SET STATISTICS IO ON

SET STATISTICS TIME ON

SET STATISTICS PROFILE ON

GO
 
-- SQL GOES HERE

GO
 
SET STATISTICS IO OFF

SET STATISTICS TIME OFF

SET STATISTICS PROFILE OFF

GO

The first thing I do is to clear all the caches so that I can the SQL can run fresh without help from previous runs.

CHECKPOINT - writes any unwritten buffers.
DBCC DROPCLEANBUFFERS - clears out the buffer cache.
DBCC FREEPROCCACHE - clears out the plan cache.

After that, I turn on some statistical options.

SET STATISTICS IO - shows you information on the disk activity that occurs with a query. The lower the numbers, the faster it is.
SET STATISTICS TIME ON - shows you how long, in milliseconds, it takes a query to run.
SET STATISTICS PROFILE ON - shows you a text version of the query execution plan. This plan reveals every step the database uses to run the query and how long it takes to run that step.

Optimizing SQL Syntax
There many syntactical choices you can make that will speed up a query. I will discuss some of those choices here. The first eight are relatively self explanatory but the last three merit further discussion.

Avoid using views.
Use an INNER JOIN as opposed to one of the OUTER JOINs whenever possible.
Avoid the SELECT *, return only the fields that you need.
Don't SORT or ORDER BY unless it's needed.
Don't JOIN to tables that you don't need for the query.
Avoid NOT, i.e. NOT IN and NOT EXISTS.
EXISTS is quicker than IN.
Use UNION ALL rather than UNION if you don't need DISTINCT records.
Sometimes a UNION is quicker than OR. To know which one is quicker, try both and look at the statistics.
When filtering records in the WHERE clause, try to avoid calculations on fields. Move them to the constant side of the equation. If it's not possible to avoid, use an index that includes the calculation.
Ninety percent of the time, filtering using a subquery in the WHERE clause is slower than filtering by JOINing to a subquery. To know which one is quicker, try both and look at the statistics.

In the WHERE clause, using an OR condition can sometimes slow down a query by a lot. If this happens, try a UNION or UNION ALL instead.

Expand|Select|Wrap|Line Numbers

 -- OR syntax

SELECT someField

FROM someTable

WHERE otherField = 5 OR otherField = 6;
 
-- UNION syntax

SELECT someField

FROM someTable

WHERE otherField = 5
 
UNION ALL
 
SELECT someField

FROM someTable

WHERE otherField = 6;

It makes for much longer code but the time savings could be huge. I once had a query that took hours using the OR syntax but under 5 seconds using the UNION. Test both and look at the statistics to see which would be faster in your case.

When you do calculations on a field in the WHERE clause, the query can no longer just scan an index because it now must perform the calculation on every record to see if it should be returned. For example, if I had a date field and I wanted to know if it was less than seven days old from the current date, I have a couple of syntactical choices:

Expand|Select|Wrap|Line Numbers

 DATEDIFF(D, dateField, GETDATE()) <= 5
 
dateField BETWEEN DATEADD(D, -5, GETDATE()) AND GETDATE()

The first is slower because an index on dateField can't be used since it must calculate for every record. And in a similar vein but with drastically different syntax:

Expand|Select|Wrap|Line Numbers

 Left(LastName, 1) = 'T'
 
LastName LIKE 'T%'

More often than not, if you need to filter records based on another table or the same table, using a subquery in a JOIN is quicker than using a subquery in the WHERE clause. For example, if I wanted all the fields for a record but only wanted the most current record based on a category and date field, I could do it one of two ways:

Expand|Select|Wrap|Line Numbers

 SELECT *

FROM someTable AS t

WHERE dateField = (

   SELECT MAX(dateField)

   FROM someTable

   WHERE categoryField = t.categoryField

);
 
SELECT *

FROM someTable AS t1

INNER JOIN (

   SELECT categoryField, MAX(dateField) AS dateField

   FROM someTable

   GROUP BY categoryField

) AS t2

ON t1.categoryField = t2.categoryField

   AND t1.dateField = t2.dateField;

Test both and look at the statistics to see which would be faster in your case. I've had situations where a query is much faster using the WHERE clause and situations where a query is much faster in the JOIN. But the JOIN approach is correct most of the time.

Query Execution Plans
The query execution plan tells you the steps the database is going to take to run a query along with the estimated/actual time it took. If a query is running slowly, this will show which step it's getting hung up on. There are a lot of different operators that the database can use to run a query but the main culprits of a slow query are Table Scans and RID Lookups. Table scans read every row of a table to find the correct records to return. RID lookups are used when a non-clustered index is being utilized to find a matching row. It merely points to a location on a heap of data and has to go to the heap to look up the value rather than the data being stored in the index. If you see either of these, you should create the appropriate indices to get rid of them. However, there's a caveat regarding table scans, for tables with a small amount of rows, table scans are quicker than index scans.

Creating Proper Indices
Using a clustered index on a primary key field is preferable. Clustered indices significantly speed up searching on a field. A clustered index determines the physical order of data in a table. Therefore, you can only have one. However, you can have as many non-clustered indices as you want.

When creating an index, order matters. Put the fields in the following order: WHERE clause, JOIN clause, ORDER BY clause, SELECT clause. So if I wanted to optimize the following query:

Expand|Select|Wrap|Line Numbers

 SELECT t1.someField

FROM someTable t1

INNER JOIN otherTable t2

ON t1.PK = t2.FK

WHERE t1.otherField = 15

ORDER BY t1.dateField

Then I would create an index on someTable with the fields in the following order: otherField, PK, dateField, someField. And an index on otherTable on the field FK.

Incidentally, the previous example was also a fully covered query. What that means is the index contains all fields required for the query. This means that the database does not have to refer back to the data in the table to find additional data for the query. When feasible, you should create an index to fully cover a query you want to run often.

However, it is important to note that an index can be "too large". Both in terms of technical limitations and usability. The more fields and the larger the fields included in an index, the longer it takes to run a query using the index.

Conclusion
A quick note about hardware. No matter how much optimization you do, sometimes you just need pure power. If you've optimized a query as much as you can but it's still too slow, you may have to upgrade your hardware.

I hope this article has armed you with the basic tools you need to take a query that runs in hours and make it run in seconds. Remember, optimization is a very deep subject and these are just the basic tools that you can use to speed up your query. The next time you have a long running query, analyse the statistics and execution plans, create the proper indices with help from the execution plan, and test different versions of the SQL syntax.

Mar 12 '12 #1

Subscribe Post Reply

7

12956

13,262

8TB

This is very useful. Good work.

Mar 13 '12 #2

2,878

Expert 2GB

Rabbit,

Very nicely done and a very helpful article. I have some other ideas and will keep adding them on this post as I remember or found them.

For now here are some of them.

These are all based on my experience and could be different from other cases.

If OR or AND is necessary
If you you really need these logical operators, arrange your conditions properly.

If you are using AND, pick the most probably False condition first. On the other hand, if you are using OR, pick the most probable True condition first.

An AND operator will return False if any of the condition is false. While an OR operator will return True if any of the condition is True.

There is actually no guarantee that SQL Server will not perform the rest of the conditions, but in some cases it stops checking the other tests when the condition is or not satisfied.

Arrange the conditions in CASE WHEN
In a CASE WHEN statement/function, place the most probable TRUE condition first so the command does not have to parse through the entire statement only to find the most use condition at the bottom. This might be a small improvement if it's a simple condition on the WHEN clause. But if you have a lot of WHEN clauses or just a few but the condition on those WHENs requires complex calculation or even referring to columns on other tables, you will experience improvement if conditions are properly arranged.

Index Seek vs Index Scan
Remember that not because you have an index on your table, SQL Server will always use it. SQL Server will figure out for itself if it's better to use Index Seek or Index Scan. Because there are times that statistics are not updated in the database, the server will use of Index Scan instead of Index Seek.

Use the statistics and Execution Plan as describe by Rabbit above. As much as possible, try to eliminate Index Scan.

Use CURSORs and RBARs as the last option
RBAR (Row By Agonizing Row) is a process where the query is forcing the server to process the data by row and not by result set. Much like how a CURSOR is processed. Although these technique have their own use, you must, as much as possible, use a result based technique than a Row by Row process. If you have to, use temporary or permanent tables.

A table must have at least one index
Although it's not required, this mindset will help you make sure that all tables have an index before you use them. And if a table will only have one index, make it a CLUSTERED. Of course you still have to check how the table is being used and/or updated.

More to come, work for now...

Mar 13 '12 #3

12,516

Expert Mod 8TB

Thanks for those additional ideas ck. We could make this thread the go to place for optimizing queries.

If any one has additional optimizations, please feel free to add your two cents.

Mar 13 '12 #4

Vikki McCormick

46

Rabbit, Great article. Anyone have any advice on how to improve performance on a non-deterministic computed column? I have some of those, and they usually are the most costly operation in the execution plan.

Mar 13 '12 #5

2,878

Expert 2GB

Try to see if you can execute the function at run time and see how the execution plan looks. Depending on how you use the function, you might be able to refactor and restructure your query to utilize Index Seek during execution instead of Index Scan.

Most of the time a non-deterministic function uses an Index Scan and not Index Seek.

Good Luck!!!

~~ CK

Mar 14 '12 #6

12,516

Expert Mod 8TB

My first piece of advice would be to not use it. If we know what the calculation is doing, we may be able to substitute it with something else. If there's no way around it, make sure it's persistent and put it in an index.

Mar 14 '12 #7

142

100+

Very Good Article.
Thanks Rabbit, CK

By Faisal

Oct 20 '13 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

Optimize a MySQL OR query

by: jy2003 | last post by:

I have read a book, which suggests we should change OR to UNION for better performance, but currently I have too many OR clauses(I have a query with 100 ORs) and it does not sound good to have 100...

Optimize query: time of "single * IN(many)" > time of "many * IN(single)"

by: Paul Janssen | last post by:

Hello! Can anyone help me out with the following situation: (a) a single query with 550 id's in the IN-clause resulting into 800+ seconds; (b) 550 queries with a single id in the IN-clause...

PostgreSQL Database

Combining two queries

by: Bruce MacDonald | last post by:

I've got a question/request for the SQL gurus. I'm building a model of bandwidth demand in MS Access and want to get aggregated results for demand at each PCP in each time period. The two...

Microsoft Access / VBA

Better multitasking when running queries?

by: PeteCresswell | last post by:

Access tends to take over my machine when I run long queries. The queries in question are Append queries. Some of the jobs we're talking about run 45 minutes to an hour...so the loss in...

Microsoft Access / VBA

Why should I use VBA instead of queries?

by: Martin Lacoste | last post by:

There's likely not a simple answer for this, I know, but, I thought I'd try anyways... Background.. I've been racking my brain with some queries that I thought were straightforward, but have...

Microsoft Access / VBA

Is Access smart enough to optimize this ?

by: mathieu.page | last post by:

Hi, I often have recursives queries in my applications, like in this simplified example : req1 : SELECT EmployeNo, EmployeName, EmployePhone FROM Employe; req2 :

Microsoft Access / VBA

by: Frank Swarbrick | last post by:

IBM has a product for the VSE operating system called the VSAM Redirector. It allows you to use VSAM to access RDBMS tables/views as if they were actual VSAM files. We're doing a comparison right...

How to Optimize Queries and Recordsets

by: ADezii | last post by:

Create Indexes on all Columns used in 'ad hoc' Query Joins, restrictions, and sorts (Jet already creates Indexes for Enforced Relationships). Use Primary Keys instead of Unique Indexes wherever...

Microsoft Access / VBA

Question regarding Compiling Queries

by: Denburt | last post by:

According to our lovely friends at Microsoft they say that after compacting a database the queries should be recompiled by opening each one then closing it. So I compacted the database I was...

Microsoft Access / VBA

Caching Queries in DataTable

by: jehugaleahsa | last post by:

Hello: I am writing a cute little class that will cache queries against a database. Currently, I am implementing this by storing the command text, parameter values and generated DataRows. ...

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.

BYTES.COM © 2024
About Bytes
Terms Of Use
Privacy Policy
Sitemap

Advertise on Bytes:
Post a Job
Sponsored Posts
Platinum & Gold Sponsors
Hire Now!