By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,064 Members | 1,542 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,064 IT Pros & Developers. It's quick & easy.

Why is a simple WHERE clause so slow?

P: n/a
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin

Jan 12 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a

Emin wrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Emin,

Before investigating why it is slow, you need to make sure it is
correct.
Note that your WHERE clause filters out all the rows with NULL
event_date, so your FULL OUTER JOIN effectively becomes a LEFT OUTER
JOIN.
If you actually need FULL OUTER JOIN, you can either push down the
predicate manually (preferred)

SELECT * FROM
( Select x.event_date From
(Select x.event_date From x WHERE event_date >= {ts '1980-01-01
00:00:00'})
x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ

or allow nulls in your WHERE clause (logically equivalent _only_ if
x.event_date is not nullable)

SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} OR
innerQ.event_date IS NULL)

Note that your expectations regarding

"SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match"

are incorrect - the optimizer can rewrite your query as it wishes.
Generally speking, it accepts WHAT to do, but it decides by itself HOW
to do it.

-----------------------
Alex Kuznetsov
http://sqlserver-tips.blogspot.com/
http://sqlserver-puzzles.blogspot.com/

Jan 12 '07 #2

P: n/a
Emin,
You gotta to be more detailed in the Q like wht data is in the tables.
For a query running slow, there could be n number of reasons. most
imortant one is having no indexes.

your query is running slower, cuz your innerQ is not effected by the
where condition, its outside the scope of the Full outer join, which is
obviously the most resourse intensive in the query.

try Rewriting the query like this:
SELECT *
FROM (
Select x.event_date
From #X x
FULL OUTER JOIN #Y y ON x.event_date = y.event_date
WHERE x.event_date >= '1980-01-01 00:00:00'
) innerQ

BTW, what is the purpose of writing the where condition like this:
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
Vs like this:
WHERE x.event_date >= '1980-01-01 00:00:00'

Does this give any specific advantage and what does it do?
Thanks, schal.

Emin wrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 12 '07 #3

P: n/a
Emin wrote:
>
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
It sounds like the optimizer is somewhat confused over your query.

It would be interesting to see the actual query plan. You can get that
by running SET SHOWPLAN_TEXT ON and then run the query (and then SET
SHOWPLAN_TEXT OFF).

I agree with Alex, that you could try to write the query the way you
normally would, like this:

SELECT x.event_date
FROM x
LEFT JOIN Y ON y.event_date = x.event_date
WHERE event_date >= '19800101'

HTH,
Gert-Jan
Jan 12 '07 #4

P: n/a
Dear Experts,

Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?

Thanks,
-Emin

----------------------------------------------------

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

--------------------------------

On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 12 '07 #5

P: n/a
Does your real situation actually resemble this repro?

In the repro, it is the difference between a hash join and loop join.
The hash join is fast, the loop join is not. It is not suprising that
the loop join is not very fast, because there is quite a lot of
calculation needed for the CTE.

If this really is your problem, then you can probably add OPTION(HASH
JOIN) to eleviate the performance problems.

But if you are not actually joining two CTE's with deep recursion, then
we probably need a more real life example.

Gert-Jan
Emin wrote:
>
Dear Experts,

Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?

Thanks,
-Emin

----------------------------------------------------

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

--------------------------------

On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 12 '07 #6

P: n/a
Dear Gert-Jan,

Thanks for the info. In my real-world problem I am doing a join on
views which seemed appropriate to simulate using CTEs in my previous
post. Based on your comments, I tried using OPTION(HASH JOIN) on my
real example and that fixed the problem.

Out of curiousity, how does one go about learning how to understand and
optimize the execution plan? For example, while I noticed that the
execution plans were different in my various cases, I never would have
known to try OPTION(HASH JOIN). Do you (or any other readers out there)
have recommendations on books or other sources for this information?

Thanks again,
-Emin

On Jan 12, 2:59 pm, Gert-Jan Strik <s...@toomuchspamalready.nlwrote:
Does your real situation actually resemble this repro?

In the repro, it is the difference between a hash join and loop join.
The hash join is fast, the loop join is not. It is not suprising that
the loop join is not very fast, because there is quite a lot of
calculation needed for the CTE.

If this really is your problem, then you can probably add OPTION(HASH
JOIN) to eleviate the performance problems.

But if you are not actually joining two CTE's with deep recursion, then
we probably need a more real life example.

Gert-Jan

Emin wrote:
Dear Experts,
Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?
Thanks,
-Emin
----------------------------------------------------
IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp
;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)
IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp
--------------------------------
On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,
I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query
-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------
Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.
Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?
Thanks,
-Emin
Jan 12 '07 #7

P: n/a

Emin wrote:
Dear Experts,

Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?

Thanks,
-Emin

----------------------------------------------------

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

--------------------------------

On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin

I guess in real life situations the optimizer will estimate selectivity
of x.event_date @YourParameter based on statistics. For inner joins
the other side does not contribute nulls, so its estimate might be
better, so it goes for a hash join.
Also note that full outer joins are rarely used in production, and as
such they have less chances of geting a good execution plan.
Your particular full outer join query is misleading and I would say it
is a very poor practice.
Because in fact your query is a left outer join, you should either
explicitly say LEFT OUTER JOIN or push your predicate down.
The reason is simple: poorly programmed queries have less chances of
geting a good execution plan.

-----------------------
Alex Kuznetsov
http://sqlserver-tips.blogspot.com/
http://sqlserver-puzzles.blogspot.com/

Jan 12 '07 #8

P: n/a
I don't have recommendations, but you shouldn't have any problems
googling them.

Reading query plans is not easy. Besides "gettings used to", it requires
a lot of knowledge of how the query optimizer works, what execution
steps seem reasonable, and which seem wasteful. The part to force a
different join type is actually quite simple. You can look up query
hints in BOL to get an overview of the possibilities. I guess the rest
is 10+ years experience and a particular interest in performance related
issues...

Gert-Jan
Emin wrote:
>
Dear Gert-Jan,

Thanks for the info. In my real-world problem I am doing a join on
views which seemed appropriate to simulate using CTEs in my previous
post. Based on your comments, I tried using OPTION(HASH JOIN) on my
real example and that fixed the problem.

Out of curiousity, how does one go about learning how to understand and
optimize the execution plan? For example, while I noticed that the
execution plans were different in my various cases, I never would have
known to try OPTION(HASH JOIN). Do you (or any other readers out there)
have recommendations on books or other sources for this information?

Thanks again,
-Emin

On Jan 12, 2:59 pm, Gert-Jan Strik <s...@toomuchspamalready.nlwrote:
Does your real situation actually resemble this repro?

In the repro, it is the difference between a hash join and loop join.
The hash join is fast, the loop join is not. It is not suprising that
the loop join is not very fast, because there is quite a lot of
calculation needed for the CTE.

If this really is your problem, then you can probably add OPTION(HASH
JOIN) to eleviate the performance problems.

But if you are not actually joining two CTE's with deep recursion, then
we probably need a more real life example.

Gert-Jan
[snip]
Jan 12 '07 #9

P: n/a
How about,
Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
where x.event_date >= {ts '1980-01-01 00:00:00'}
and y.event_date >= {ts '1980-01-01 00:00:00'}

does it make run faster, slower, same?

Emin wrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 15 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.