473,403 Members | 2,354 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

[How to] avoid cross product/Cartesian product to improve performance

Hi,

I have 1 table contains about 4 millions entries structure like below:

[Alarm History]
(
AlarmID int,
SetTime datetime
)

Now I want to :
SELECT all the AlarmID that happened during Jan 2008 and no such AlarmID during Dec 2007. I used:

SELECT * FROM [Alarm History]
WHERE SetTime BETWEEN '1-jan-2008' AND '31-jan-2008'
AND AlarmID NOT IN

(
SELECT AlarmID FROM [Alarm History]
WHERE SetTime BETWEEN '1-dec-2007' AND '31-DEC-2007'
)

My querry take very slow since there are more than 20,000 entries during Dec 2007 and 15,000 entries during Jan 2008,

"NOT IN" operation is like an Cartesian Product : Compare each AlarmID during December 2007 with each AlarmID during Jan 2008. The results is very slow performance.

Anybody can help me find an alternatives way to do this ? Thanks a lot.
Jan 31 '08 #1
5 12556
code green
1,726 Expert 1GB
It is probably the sub-query that slows it down.
There is an equivalent JOIN looking for NULL instead of NOT IN
Expand|Select|Wrap|Line Numbers
  1. SELECT * FROM [Alarm History] Jan
  2. LEFT JOIN [Alarm History] Dec ON (Jan.AlarmID = Dec.AlarmID
  3. AND Dec.SetTime BETWEEN '1-dec-2007' AND '31-DEC-2007')
  4. WHERE Jan.SetTime BETWEEN '1-jan-2008' AND '31-jan-2008'
  5. AND IS NULL Dec.AlarmID
  6.  
I think
Jan 31 '08 #2
Delerna
1,134 Expert 1GB
If you want to compare the effectiveness of two different methods of writing a query then

Open query analyser and paste the two queries into the query analyser window

Then click the display estimated execution plan button and this will display a graphical representation of how each query will execute

I did that for the 2 queries here, changing table and field names to suit one of my tables

Sorry to say that, at least in the check that I did, code green's method was 3 times slower. 28%(lightkeepers method) to 72% (code greens method)
That may be different when you try it on your own table ???

Also the execution plan will show you which parts of your query is spending the most time and therefow shows where you might be able to improve performance.
Look for loops and and high I/O costs when looking for performance bottlenecks

All the above is in relation to MS SQL Server. I guess other databse engines have something similar
Feb 1 '08 #3
code green
1,726 Expert 1GB
Wow! 3 times slower. That suprised me.
What is really slowing the query down is the date comparison.
I have tried similar queries in both formats on my online server (1and1).
They both timed out.
I was able to get around it because my table used auto-ids.
I then used SQL variables to get the IDs of the minimum and maximum dates.
Then did a SELECT comparing IDs rather than DATE.
Very fast.
Expand|Select|Wrap|Line Numbers
  1.  //Get the IDs of earliest and latest dates
  2. SELECT @earliest :=   MAX(AlarmID)  FROM [Alarm History]
  3. WHERE SetTime >= '1-jan-2008'; 
  4. SELECT @latest :=   MAX(AlarmID)  FROM [Alarm History]
  5. WHERE SetTime <= '31-jan-2008'
  6.  
  7. Use these to filter the main query
  8. SELECT AlarmID FROM [Alarm History]
  9. WHERE AlarmID > @earliest AND AlarmID< @latest 
  10. AND  ....
  11.  
I kow this is MySql but could you adapt this idea to your table?
Feb 1 '08 #4
Delerna
1,134 Expert 1GB
Yea it surprised me also as I also thought that the subquery was the problem but it seems that SQL Server does a prettey good job of executing it. I also tried a method of my own and was beaten 48% to 52%. I personally have not used a "where not in" query myself but I think I will be taking a closer look at it in the future.

One thing that may help the speed of this query is indexes I have seen slow queries that had nothing wrong with the way it was written. The sheer number of records was the cause. Well thought out indexes took those queries from minutes to a few seconds.
Feb 1 '08 #5
use a left outer join to your "not in" table, group by some columns in table a and at least one column in table b (the "not in" table) having b.some_column is null.

This is much more effecient than a not in statement.

i.e.

Expand|Select|Wrap|Line Numbers
  1. select a.col1, a.col2
  2. from tablea as a
  3. left outer join tableb as b
  4.   on b.fkey_id = a.id
  5. group by a.col1, a.col2, b.some_column
  6. having b.some_column is null
Jul 16 '13 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: deancoo | last post by:
I need to do a Cartesian product, which is inherently expensive. Turns out, it's too expensive. I've dropped in that portion of my C++ code in hopes that someone with greater expertise with STL...
5
by: Ã…smund Kveim Lie | last post by:
Hi, We have found a possible bug in 7.3.1. It seems that using CROSS JOIN and doing plain Cartesian product, listing to tables in the from clause, gives different results. According to the...
7
by: Eric Slan | last post by:
Hello All: I'm having a problem that's been baffling me for a few days and I seek counsel here. I have an Access 2000 DB from which I want to run several reports. These reports are...
4
by: John Smith | last post by:
Isn't life a bitch! You know what you want but you don't know how to get it. I have produced 12 queries that calculate a payment profile over 12 months. For a number of the records (ie with...
2
by: manning_news | last post by:
Using A2K. I've got a database with client info and each client has a subform which contains types of income and the amount they each receive. The record source of the subform is a cartesian...
1
by: Lakesider | last post by:
Hi NG, I have written an application with a lot of file- and database operations. There are several algorithmic operations, too. My question is: are ther any tools to improve performance -...
0
by: Swami | last post by:
I have 2 questions relating to website design in asp .net: 1. In a website that I am building I have everything as a user control. Even the header, which contains the navigation tabs is in a user...
2
by: zfareed | last post by:
I have a program that creates two sets, one thru user interaction and the other with the use of an array. Can anyone help with coding for finding the cartesian product of the two sets; i.e a...
5
by: Gilles Ganault | last post by:
Hello I'm no PHP expert, and I'm reading "Building scalable web sites". In the tips section, the author mentions using templates to speed things up. I was wondering how the template engines...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.