473,287 Members | 1,741 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,287 software developers and data experts.

Why is a simple WHERE clause so slow?

Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin

Jan 12 '07 #1
9 19051

Emin wrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Emin,

Before investigating why it is slow, you need to make sure it is
correct.
Note that your WHERE clause filters out all the rows with NULL
event_date, so your FULL OUTER JOIN effectively becomes a LEFT OUTER
JOIN.
If you actually need FULL OUTER JOIN, you can either push down the
predicate manually (preferred)

SELECT * FROM
( Select x.event_date From
(Select x.event_date From x WHERE event_date >= {ts '1980-01-01
00:00:00'})
x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ

or allow nulls in your WHERE clause (logically equivalent _only_ if
x.event_date is not nullable)

SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} OR
innerQ.event_date IS NULL)

Note that your expectations regarding

"SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match"

are incorrect - the optimizer can rewrite your query as it wishes.
Generally speking, it accepts WHAT to do, but it decides by itself HOW
to do it.

-----------------------
Alex Kuznetsov
http://sqlserver-tips.blogspot.com/
http://sqlserver-puzzles.blogspot.com/

Jan 12 '07 #2
Emin,
You gotta to be more detailed in the Q like wht data is in the tables.
For a query running slow, there could be n number of reasons. most
imortant one is having no indexes.

your query is running slower, cuz your innerQ is not effected by the
where condition, its outside the scope of the Full outer join, which is
obviously the most resourse intensive in the query.

try Rewriting the query like this:
SELECT *
FROM (
Select x.event_date
From #X x
FULL OUTER JOIN #Y y ON x.event_date = y.event_date
WHERE x.event_date >= '1980-01-01 00:00:00'
) innerQ

BTW, what is the purpose of writing the where condition like this:
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
Vs like this:
WHERE x.event_date >= '1980-01-01 00:00:00'

Does this give any specific advantage and what does it do?
Thanks, schal.

Emin wrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 12 '07 #3
Emin wrote:
>
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
It sounds like the optimizer is somewhat confused over your query.

It would be interesting to see the actual query plan. You can get that
by running SET SHOWPLAN_TEXT ON and then run the query (and then SET
SHOWPLAN_TEXT OFF).

I agree with Alex, that you could try to write the query the way you
normally would, like this:

SELECT x.event_date
FROM x
LEFT JOIN Y ON y.event_date = x.event_date
WHERE event_date >= '19800101'

HTH,
Gert-Jan
Jan 12 '07 #4
Dear Experts,

Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?

Thanks,
-Emin

----------------------------------------------------

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

--------------------------------

On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 12 '07 #5
Does your real situation actually resemble this repro?

In the repro, it is the difference between a hash join and loop join.
The hash join is fast, the loop join is not. It is not suprising that
the loop join is not very fast, because there is quite a lot of
calculation needed for the CTE.

If this really is your problem, then you can probably add OPTION(HASH
JOIN) to eleviate the performance problems.

But if you are not actually joining two CTE's with deep recursion, then
we probably need a more real life example.

Gert-Jan
Emin wrote:
>
Dear Experts,

Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?

Thanks,
-Emin

----------------------------------------------------

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

--------------------------------

On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 12 '07 #6
Dear Gert-Jan,

Thanks for the info. In my real-world problem I am doing a join on
views which seemed appropriate to simulate using CTEs in my previous
post. Based on your comments, I tried using OPTION(HASH JOIN) on my
real example and that fixed the problem.

Out of curiousity, how does one go about learning how to understand and
optimize the execution plan? For example, while I noticed that the
execution plans were different in my various cases, I never would have
known to try OPTION(HASH JOIN). Do you (or any other readers out there)
have recommendations on books or other sources for this information?

Thanks again,
-Emin

On Jan 12, 2:59 pm, Gert-Jan Strik <s...@toomuchspamalready.nlwrote:
Does your real situation actually resemble this repro?

In the repro, it is the difference between a hash join and loop join.
The hash join is fast, the loop join is not. It is not suprising that
the loop join is not very fast, because there is quite a lot of
calculation needed for the CTE.

If this really is your problem, then you can probably add OPTION(HASH
JOIN) to eleviate the performance problems.

But if you are not actually joining two CTE's with deep recursion, then
we probably need a more real life example.

Gert-Jan

Emin wrote:
Dear Experts,
Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?
Thanks,
-Emin
----------------------------------------------------
IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp
;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)
IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp
--------------------------------
On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,
I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query
-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------
Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.
Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?
Thanks,
-Emin
Jan 12 '07 #7

Emin wrote:
Dear Experts,

Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?

Thanks,
-Emin

----------------------------------------------------

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

--------------------------------

On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin

I guess in real life situations the optimizer will estimate selectivity
of x.event_date @YourParameter based on statistics. For inner joins
the other side does not contribute nulls, so its estimate might be
better, so it goes for a hash join.
Also note that full outer joins are rarely used in production, and as
such they have less chances of geting a good execution plan.
Your particular full outer join query is misleading and I would say it
is a very poor practice.
Because in fact your query is a left outer join, you should either
explicitly say LEFT OUTER JOIN or push your predicate down.
The reason is simple: poorly programmed queries have less chances of
geting a good execution plan.

-----------------------
Alex Kuznetsov
http://sqlserver-tips.blogspot.com/
http://sqlserver-puzzles.blogspot.com/

Jan 12 '07 #8
I don't have recommendations, but you shouldn't have any problems
googling them.

Reading query plans is not easy. Besides "gettings used to", it requires
a lot of knowledge of how the query optimizer works, what execution
steps seem reasonable, and which seem wasteful. The part to force a
different join type is actually quite simple. You can look up query
hints in BOL to get an overview of the possibilities. I guess the rest
is 10+ years experience and a particular interest in performance related
issues...

Gert-Jan
Emin wrote:
>
Dear Gert-Jan,

Thanks for the info. In my real-world problem I am doing a join on
views which seemed appropriate to simulate using CTEs in my previous
post. Based on your comments, I tried using OPTION(HASH JOIN) on my
real example and that fixed the problem.

Out of curiousity, how does one go about learning how to understand and
optimize the execution plan? For example, while I noticed that the
execution plans were different in my various cases, I never would have
known to try OPTION(HASH JOIN). Do you (or any other readers out there)
have recommendations on books or other sources for this information?

Thanks again,
-Emin

On Jan 12, 2:59 pm, Gert-Jan Strik <s...@toomuchspamalready.nlwrote:
Does your real situation actually resemble this repro?

In the repro, it is the difference between a hash join and loop join.
The hash join is fast, the loop join is not. It is not suprising that
the loop join is not very fast, because there is quite a lot of
calculation needed for the CTE.

If this really is your problem, then you can probably add OPTION(HASH
JOIN) to eleviate the performance problems.

But if you are not actually joining two CTE's with deep recursion, then
we probably need a more real life example.

Gert-Jan
[snip]
Jan 12 '07 #9
How about,
Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
where x.event_date >= {ts '1980-01-01 00:00:00'}
and y.event_date >= {ts '1980-01-01 00:00:00'}

does it make run faster, slower, same?

Emin wrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 15 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Arthur J. O'Dwyer | last post by:
I was paging through Coplien's book "Advanced C++ Programming Styles and Idioms" this afternoon and found some code that looked something like void sort(vector<foo> a) { int flip; do { for...
1
by: M Wells | last post by:
Hi all, Just wondering if anyone can tell me if an order by clause on a select query would have any impact on the time it takes to retrieve results? Essentially I'm selecting Top 1 out of a...
7
by: Erik Borgstr?m | last post by:
Hi, I simply want to use simple matrices of ints or doubles in C++ and I want all the good: 1) be able to use double-index, e.g. m 2) not have to use pointers at all 3) do it fast I know...
24
by: chri_schiller | last post by:
I have a home-made website that provides a free 1100 page physics textbook. It is written in html and css. I recently added some chinese text, and since that day there are problems. The entry...
27
by: Chris, Master of All Things Insignificant | last post by:
I have come to greatly respect both Herfried & Cor's reponses and since the two conflicted, I wanted to get some clarification. My orginal post: Herfried, maybe your example here can get you to...
2
by: Zygo Blaxell | last post by:
I have a table with a few million rows of temperature data keyed by timestamp. I want to group these rows by timestamp intervals (e.g. every 32 seconds), compute aggregate functions on the...
20
by: Brian Tkatch | last post by:
An ORDER BY a simple-integer inside a FUNCTION, results in SQL0440N, unless the FUNCTION expects an INTEGER as its parameter. For example: DECLARE GLOBAL TEMPORARY TABLE A(A CHAR(1)) INSERT INTO...
0
by: mflll | last post by:
I want to derive or substitute an Item element containing just text by another element with regular contents. The context is that I have a contract which has Block's that represent clause's....
3
by: rogynskyy | last post by:
Hi guys, I'm running MSDE 2000 A on Win XP I've got a database with several tables, all of the tables display data in query manager. I wrote this simple query: Select
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.