By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,939 Members | 1,621 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,939 IT Pros & Developers. It's quick & easy.

[How to] avoid cross product/Cartesian product to improve performance

P: 2
Hi,

I have 1 table contains about 4 millions entries structure like below:

[Alarm History]
(
AlarmID int,
SetTime datetime
)

Now I want to :
SELECT all the AlarmID that happened during Jan 2008 and no such AlarmID during Dec 2007. I used:

SELECT * FROM [Alarm History]
WHERE SetTime BETWEEN '1-jan-2008' AND '31-jan-2008'
AND AlarmID NOT IN

(
SELECT AlarmID FROM [Alarm History]
WHERE SetTime BETWEEN '1-dec-2007' AND '31-DEC-2007'
)

My querry take very slow since there are more than 20,000 entries during Dec 2007 and 15,000 entries during Jan 2008,

"NOT IN" operation is like an Cartesian Product : Compare each AlarmID during December 2007 with each AlarmID during Jan 2008. The results is very slow performance.

Anybody can help me find an alternatives way to do this ? Thanks a lot.
Jan 31 '08 #1
Share this Question
Share on Google+
5 Replies


code green
Expert 100+
P: 1,726
It is probably the sub-query that slows it down.
There is an equivalent JOIN looking for NULL instead of NOT IN
Expand|Select|Wrap|Line Numbers
  1. SELECT * FROM [Alarm History] Jan
  2. LEFT JOIN [Alarm History] Dec ON (Jan.AlarmID = Dec.AlarmID
  3. AND Dec.SetTime BETWEEN '1-dec-2007' AND '31-DEC-2007')
  4. WHERE Jan.SetTime BETWEEN '1-jan-2008' AND '31-jan-2008'
  5. AND IS NULL Dec.AlarmID
  6.  
I think
Jan 31 '08 #2

Delerna
Expert 100+
P: 1,134
If you want to compare the effectiveness of two different methods of writing a query then

Open query analyser and paste the two queries into the query analyser window

Then click the display estimated execution plan button and this will display a graphical representation of how each query will execute

I did that for the 2 queries here, changing table and field names to suit one of my tables

Sorry to say that, at least in the check that I did, code green's method was 3 times slower. 28%(lightkeepers method) to 72% (code greens method)
That may be different when you try it on your own table ???

Also the execution plan will show you which parts of your query is spending the most time and therefow shows where you might be able to improve performance.
Look for loops and and high I/O costs when looking for performance bottlenecks

All the above is in relation to MS SQL Server. I guess other databse engines have something similar
Feb 1 '08 #3

code green
Expert 100+
P: 1,726
Wow! 3 times slower. That suprised me.
What is really slowing the query down is the date comparison.
I have tried similar queries in both formats on my online server (1and1).
They both timed out.
I was able to get around it because my table used auto-ids.
I then used SQL variables to get the IDs of the minimum and maximum dates.
Then did a SELECT comparing IDs rather than DATE.
Very fast.
Expand|Select|Wrap|Line Numbers
  1.  //Get the IDs of earliest and latest dates
  2. SELECT @earliest :=   MAX(AlarmID)  FROM [Alarm History]
  3. WHERE SetTime >= '1-jan-2008'; 
  4. SELECT @latest :=   MAX(AlarmID)  FROM [Alarm History]
  5. WHERE SetTime <= '31-jan-2008'
  6.  
  7. Use these to filter the main query
  8. SELECT AlarmID FROM [Alarm History]
  9. WHERE AlarmID > @earliest AND AlarmID< @latest 
  10. AND  ....
  11.  
I kow this is MySql but could you adapt this idea to your table?
Feb 1 '08 #4

Delerna
Expert 100+
P: 1,134
Yea it surprised me also as I also thought that the subquery was the problem but it seems that SQL Server does a prettey good job of executing it. I also tried a method of my own and was beaten 48% to 52%. I personally have not used a "where not in" query myself but I think I will be taking a closer look at it in the future.

One thing that may help the speed of this query is indexes I have seen slow queries that had nothing wrong with the way it was written. The sheer number of records was the cause. Well thought out indexes took those queries from minutes to a few seconds.
Feb 1 '08 #5

P: 1
use a left outer join to your "not in" table, group by some columns in table a and at least one column in table b (the "not in" table) having b.some_column is null.

This is much more effecient than a not in statement.

i.e.

Expand|Select|Wrap|Line Numbers
  1. select a.col1, a.col2
  2. from tablea as a
  3. left outer join tableb as b
  4.   on b.fkey_id = a.id
  5. group by a.col1, a.col2, b.some_column
  6. having b.some_column is null
Jul 16 '13 #6

Post your reply

Sign in to post your reply or Sign up for a free account.