471,090 Members | 1,294 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,090 software developers and data experts.

Alternate to a not in query

-- tested schema below --
-- create tables --
create table tbl_test
(serialnumber char(12))
go
create table tbl_test2
(serialnumber char(12),
exportedflag int)
go
--insert data --
insert into tbl_test2 values ('123456789010',0)
insert into tbl_test2 values ('123456789011',0)
insert into tbl_test2 values ('123456789012',0)
insert into tbl_test2 values ('123456789013',0)
insert into tbl_test2 values ('123456789014',0)
insert into tbl_test2 values ('123456789015',0)
insert into tbl_test2 values ('123456789016',0)
insert into tbl_test2 values ('123456789017',0)
insert into tbl_test2 values ('123456789018',0)
insert into tbl_test2 values ('123456789019',0)
insert into tbl_test values ('123456789011')
insert into tbl_test values ('123456789012')
insert into tbl_test values ('123456789013')
insert into tbl_test values ('123456789014')
insert into tbl_test values ('123456789015')

-- query --
Select serialnumber from tbl_test2
where serialnumber
not in (select serialnumber from tbl_test) and
exportedflag=0

This query runs quite fast with only the data above but when both
tables get million plus rows, the query simply bogs down. Is there a
better way to write this query?

Jul 23 '05 #1
5 2035
Select serialnumber
from tbl_test2 a
left joint tbl_test b on a. serialnumber = b.serialnumber
where (b.serialnumber IS NULL)
AND (a.exportedflag=0)

Jul 23 '05 #2
There is another way to write the query, but it's not better (in fact,
I think it's worse):

Select tbl_test2.serialnumber from tbl_test2
left join tbl_test on tbl_test2.serialnumber=tbl_test.serialnumber
where exportedflag=0 and tbl_test.serialnumber is null

To improve the performance of this query, you should create primary
keys on the tables. Besides the conceptual benefits of a proper design,
this would accomplish (at least) the following things:
- create an index on the serialnumber column
- declare that the serialnumber column does not allow duplicates
- declare that the serialnumber column does not allow nulls
These things will help the Query Optimizer very much to create a better
execution plan.

Razvan

Jul 23 '05 #3


Razvan Socol wrote:
There is another way to write the query, but it's not better (in fact,
I think it's worse):

Select tbl_test2.serialnumber from tbl_test2
left join tbl_test on tbl_test2.serialnumber=tbl_test.serialnumber
where exportedflag=0 and tbl_test.serialnumber is null


Razvan,

Why worse?

The common wisdom seems to be that it is always more efficient
eliminate nested subqueries, if possible.

My understanding is that the optimizer will internally eliminate the
subquery by doing a left join as above if it can.

Jul 23 '05 #4
Ira Gladnick (Ir*********@yahoo.com) writes:
Why worse?

The common wisdom seems to be that it is always more efficient
eliminate nested subqueries, if possible.
It's worse, becase it does not express the intent of the query equally
well, and therefore can contribute to higher maintenance costs.
My understanding is that the optimizer will internally eliminate the
subquery by doing a left join as above if it can.


I don't know if this is the case, but in such case there is even less
reason to rewrite the query in an obscure way.

I would write the query as:

Select serialnumber
from tbl_test2 t2
where not exists (select *
from tbl_test t
where t2.serialnuber = t.serialnumber)
and exportedflag=0

In SQL 6.5 this would typically perform better than NOT IN. But I believe
SQL 2000 will rewrite NOT IN to NOT EXISTS internally, so it is not that
much of an issue for performance. But NOT EXISTS is more general to use
than NOT IN, because you can handle multi-column conditions. Furthermore,
if there are NULL values involved, NOT IN can give you surpriese.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #5
(kj****@hotmail.com) writes:
-- query --
Select serialnumber from tbl_test2
where serialnumber
not in (select serialnumber from tbl_test) and
exportedflag=0

This query runs quite fast with only the data above but when both
tables get million plus rows, the query simply bogs down. Is there a
better way to write this query?


Beside the obvious point from Razvan about indexes, if you are on a multi-
CPU box, you can try this at the end of the query:

OPTION (MAXDOP 1)

this turns off parallelism. I've seen SQL Server use massive parallel
plans for this type of query, when a non-parallel plan have been much
faster.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Carl Draus | last post: by
1 post views Thread by Eric Lindsay | last post: by
9 posts views Thread by johkar | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.