473,511 Members | 15,624 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Why is a simple WHERE clause so slow?

Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin

Jan 12 '07 #1
9 19118

Emin wrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Emin,

Before investigating why it is slow, you need to make sure it is
correct.
Note that your WHERE clause filters out all the rows with NULL
event_date, so your FULL OUTER JOIN effectively becomes a LEFT OUTER
JOIN.
If you actually need FULL OUTER JOIN, you can either push down the
predicate manually (preferred)

SELECT * FROM
( Select x.event_date From
(Select x.event_date From x WHERE event_date >= {ts '1980-01-01
00:00:00'})
x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ

or allow nulls in your WHERE clause (logically equivalent _only_ if
x.event_date is not nullable)

SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} OR
innerQ.event_date IS NULL)

Note that your expectations regarding

"SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match"

are incorrect - the optimizer can rewrite your query as it wishes.
Generally speking, it accepts WHAT to do, but it decides by itself HOW
to do it.

-----------------------
Alex Kuznetsov
http://sqlserver-tips.blogspot.com/
http://sqlserver-puzzles.blogspot.com/

Jan 12 '07 #2
Emin,
You gotta to be more detailed in the Q like wht data is in the tables.
For a query running slow, there could be n number of reasons. most
imortant one is having no indexes.

your query is running slower, cuz your innerQ is not effected by the
where condition, its outside the scope of the Full outer join, which is
obviously the most resourse intensive in the query.

try Rewriting the query like this:
SELECT *
FROM (
Select x.event_date
From #X x
FULL OUTER JOIN #Y y ON x.event_date = y.event_date
WHERE x.event_date >= '1980-01-01 00:00:00'
) innerQ

BTW, what is the purpose of writing the where condition like this:
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
Vs like this:
WHERE x.event_date >= '1980-01-01 00:00:00'

Does this give any specific advantage and what does it do?
Thanks, schal.

Emin wrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 12 '07 #3
Emin wrote:
>
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
It sounds like the optimizer is somewhat confused over your query.

It would be interesting to see the actual query plan. You can get that
by running SET SHOWPLAN_TEXT ON and then run the query (and then SET
SHOWPLAN_TEXT OFF).

I agree with Alex, that you could try to write the query the way you
normally would, like this:

SELECT x.event_date
FROM x
LEFT JOIN Y ON y.event_date = x.event_date
WHERE event_date >= '19800101'

HTH,
Gert-Jan
Jan 12 '07 #4
Dear Experts,

Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?

Thanks,
-Emin

----------------------------------------------------

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

--------------------------------

On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 12 '07 #5
Does your real situation actually resemble this repro?

In the repro, it is the difference between a hash join and loop join.
The hash join is fast, the loop join is not. It is not suprising that
the loop join is not very fast, because there is quite a lot of
calculation needed for the CTE.

If this really is your problem, then you can probably add OPTION(HASH
JOIN) to eleviate the performance problems.

But if you are not actually joining two CTE's with deep recursion, then
we probably need a more real life example.

Gert-Jan
Emin wrote:
>
Dear Experts,

Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?

Thanks,
-Emin

----------------------------------------------------

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

--------------------------------

On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 12 '07 #6
Dear Gert-Jan,

Thanks for the info. In my real-world problem I am doing a join on
views which seemed appropriate to simulate using CTEs in my previous
post. Based on your comments, I tried using OPTION(HASH JOIN) on my
real example and that fixed the problem.

Out of curiousity, how does one go about learning how to understand and
optimize the execution plan? For example, while I noticed that the
execution plans were different in my various cases, I never would have
known to try OPTION(HASH JOIN). Do you (or any other readers out there)
have recommendations on books or other sources for this information?

Thanks again,
-Emin

On Jan 12, 2:59 pm, Gert-Jan Strik <s...@toomuchspamalready.nlwrote:
Does your real situation actually resemble this repro?

In the repro, it is the difference between a hash join and loop join.
The hash join is fast, the loop join is not. It is not suprising that
the loop join is not very fast, because there is quite a lot of
calculation needed for the CTE.

If this really is your problem, then you can probably add OPTION(HASH
JOIN) to eleviate the performance problems.

But if you are not actually joining two CTE's with deep recursion, then
we probably need a more real life example.

Gert-Jan

Emin wrote:
Dear Experts,
Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?
Thanks,
-Emin
----------------------------------------------------
IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp
;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)
IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp
--------------------------------
On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,
I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query
-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------
Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.
Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?
Thanks,
-Emin
Jan 12 '07 #7

Emin wrote:
Dear Experts,

Thanks for the comments so far. To help describe the problem, I
constructed a stand alone example that illustrates the issue on MS SQL
Server 2005. If you run the following query, it will take a very long
time. But if you replace "FULL OUTER" with "INNER" or if you remove the
WHERE clause, the query runs fine. Any ideas why FULL OUTER JOIN blows
things up even though there are no NULLs in sight?

Thanks,
-Emin

----------------------------------------------------

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

;with TallyTableCTE(i)
as
(
select i = {ts '1970-01-01 00:00:00'}
union all
select i = i + 1 from TallyTableCTE where i < {ts '2007-01-01
00:00:00'}
)
,AnotherTallyTableCTE(j)
as
(
select j = {ts '1970-01-01 00:00:00'}
union all
select j = j + 1 from AnotherTallyTableCTE where j < {ts '2007-01-01
00:00:00'}
)
select *
into dbo.temp FROM
TallyTableCTE x
FULL OUTER JOIN --FULL OUTER JOIN with WHERE clause causes problem
--INNER JOIN --using INNER JOIN instead of FULL OUTER JOIN removes
problem
AnotherTallyTableCTE y
ON x.i = y.j
where x.i {ts '1980-01-01 00:00:00'} --removing WHERE clause removes
problem
option (maxrecursion 30000)

IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'dbo.temp'))
DROP TABLE dbo.temp

--------------------------------

On Jan 12, 11:14 am, "Emin" <emin.shop...@gmail.comwrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin

I guess in real life situations the optimizer will estimate selectivity
of x.event_date @YourParameter based on statistics. For inner joins
the other side does not contribute nulls, so its estimate might be
better, so it goes for a hash join.
Also note that full outer joins are rarely used in production, and as
such they have less chances of geting a good execution plan.
Your particular full outer join query is misleading and I would say it
is a very poor practice.
Because in fact your query is a left outer join, you should either
explicitly say LEFT OUTER JOIN or push your predicate down.
The reason is simple: poorly programmed queries have less chances of
geting a good execution plan.

-----------------------
Alex Kuznetsov
http://sqlserver-tips.blogspot.com/
http://sqlserver-puzzles.blogspot.com/

Jan 12 '07 #8
I don't have recommendations, but you shouldn't have any problems
googling them.

Reading query plans is not easy. Besides "gettings used to", it requires
a lot of knowledge of how the query optimizer works, what execution
steps seem reasonable, and which seem wasteful. The part to force a
different join type is actually quite simple. You can look up query
hints in BOL to get an overview of the possibilities. I guess the rest
is 10+ years experience and a particular interest in performance related
issues...

Gert-Jan
Emin wrote:
>
Dear Gert-Jan,

Thanks for the info. In my real-world problem I am doing a join on
views which seemed appropriate to simulate using CTEs in my previous
post. Based on your comments, I tried using OPTION(HASH JOIN) on my
real example and that fixed the problem.

Out of curiousity, how does one go about learning how to understand and
optimize the execution plan? For example, while I noticed that the
execution plans were different in my various cases, I never would have
known to try OPTION(HASH JOIN). Do you (or any other readers out there)
have recommendations on books or other sources for this information?

Thanks again,
-Emin

On Jan 12, 2:59 pm, Gert-Jan Strik <s...@toomuchspamalready.nlwrote:
Does your real situation actually resemble this repro?

In the repro, it is the difference between a hash join and loop join.
The hash join is fast, the loop join is not. It is not suprising that
the loop join is not very fast, because there is quite a lot of
calculation needed for the CTE.

If this really is your problem, then you can probably add OPTION(HASH
JOIN) to eleviate the performance problems.

But if you are not actually joining two CTE's with deep recursion, then
we probably need a more real life example.

Gert-Jan
[snip]
Jan 12 '07 #9
How about,
Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
where x.event_date >= {ts '1980-01-01 00:00:00'}
and y.event_date >= {ts '1980-01-01 00:00:00'}

does it make run faster, slower, same?

Emin wrote:
Dear Experts,

I have a fairly simple query in which adding a where clause slows
things down by at least a factor of 100. The following is the slow
version of the query

-------------------------
SELECT * FROM
( Select x.event_date From x FULL OUTER JOIN y
ON x.event_date = y.event_date
) innerQ
WHERE ( innerQ.event_date >= {ts '1980-01-01 00:00:00'} )
------------------------

Removing the where clause makes the query run quickly. This seems
extremely strange because it seems like SQL Server should simply be
able to take the results of innerQ and discard anything with a date
that doesn't match. If I instead split the query into two pieces where
I create a temp table and put innerQ into that and then do the select *
WHERE (...) from the temp table things work fine.

Any thoughts on what SQL Server might be doing to make things slow and
how I can fix it?

Thanks,
-Emin
Jan 15 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2333
by: Arthur J. O'Dwyer | last post by:
I was paging through Coplien's book "Advanced C++ Programming Styles and Idioms" this afternoon and found some code that looked something like void sort(vector<foo> a) { int flip; do { for...
1
6384
by: M Wells | last post by:
Hi all, Just wondering if anyone can tell me if an order by clause on a select query would have any impact on the time it takes to retrieve results? Essentially I'm selecting Top 1 out of a...
7
2083
by: Erik Borgstr?m | last post by:
Hi, I simply want to use simple matrices of ints or doubles in C++ and I want all the good: 1) be able to use double-index, e.g. m 2) not have to use pointers at all 3) do it fast I know...
24
2803
by: chri_schiller | last post by:
I have a home-made website that provides a free 1100 page physics textbook. It is written in html and css. I recently added some chinese text, and since that day there are problems. The entry...
27
4671
by: Chris, Master of All Things Insignificant | last post by:
I have come to greatly respect both Herfried & Cor's reponses and since the two conflicted, I wanted to get some clarification. My orginal post: Herfried, maybe your example here can get you to...
2
720
by: Zygo Blaxell | last post by:
I have a table with a few million rows of temperature data keyed by timestamp. I want to group these rows by timestamp intervals (e.g. every 32 seconds), compute aggregate functions on the...
20
2461
by: Brian Tkatch | last post by:
An ORDER BY a simple-integer inside a FUNCTION, results in SQL0440N, unless the FUNCTION expects an INTEGER as its parameter. For example: DECLARE GLOBAL TEMPORARY TABLE A(A CHAR(1)) INSERT INTO...
0
1196
by: mflll | last post by:
I want to derive or substitute an Item element containing just text by another element with regular contents. The context is that I have a contract which has Block's that represent clause's....
3
3248
by: rogynskyy | last post by:
Hi guys, I'm running MSDE 2000 A on Win XP I've got a database with several tables, all of the tables display data in query manager. I wrote this simple query: Select
0
7252
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7432
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7093
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5676
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5077
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4743
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3230
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3218
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
791
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.