By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,004 Members | 1,296 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,004 IT Pros & Developers. It's quick & easy.

not in subselect

P: n/a
I have a performance problem with the following query and variations on the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before the
NOT logic?

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

or
select a.COMPUTER_ALIAS from inv4.computer a,
inv4.active_ad_systems b
where a.computer_alias = b.computer_name and a.computer_sys_id NOT IN
(select distinct computer_sys_id from inv4.matched_sware)

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and not exists ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

Nov 12 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
James wrote:
I have a performance problem with the following query and variations on
the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so
far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before the
NOT logic?
The idea of SQL is that you tell the system _what_ you want, and the system
figures out the best way _how_ to get the result. So rephrasing queries
often doesn't help at all because the same internal, optimized plan might
result.

In your case, I would rather have a look at the access plan for the query
and see if appropriate indexes were defined and if statistics were
collected. Additionally, check you general system configuration. (You did
run the Configuration Advisor?)
select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)


You don't need the DISTINCT keyword in the subselect. If there is any row
to be returned at allin the distinct set produced by the subselect, there
must also be a row in the multi-set (w/o distinct) and vice versa.

Personally, I would write the query like shown below because I like it to
have a clear structure showing what is selected from which tables. You had
a join in the FROM clause, but the 2nd table was'nt used in the SELECT
list, which I find rather confusing.

SELECT a.computer_alias
FROM inv4.computer AS a
WHERE EXISTS ( SELECT 1
FROM inv4.active_ad_systems AS b
WHERE a.computer_alias = b.computer_name ) AND
NOT EXISTS ( SELECT 1
FROM inv4.matched_sware AS c
WHERE a.computer_sys_id = c.computer_sys_id )

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena
Nov 12 '05 #2

P: n/a
I doubt your problem is the NOT EXISTS clause. From your description;
you have a table, matched_sware, that contains 18,000,000 rows and is
not indexed. Your query requires "matching" information from this table
to multiple rows from another source. Without an appropriate index, the
optimizer should choose to scan and sort 18 million COMPUTER_SYS_IDs
from matched_sware. This is most likely the cause of the lengthy run
time. I'd also expect a significant impact on other database performance
while this is running.

You didn't state anything about the indexing on active_ad_systems either.

Using the table names only, I'd guess that you should have at least two
indexes on matched_sware. One should have CONPUTER_SYS_ID as its first
column, the other whatever you use to indicate a unique "sware".
Depending on the data needed from matched_sware; you may want to include
additional columns in the indexes to prompte index-only retrievals.
Additional columns in the index is a tradeoff of disk space for a
decrease in I/O by avoiding an additional retrieval of table data.
Another question to examine is the business meaning of duplicate rows in
matched_sware. Access plans, generated by EXPLAIN, as Knut suggested,
are the first step to improve performance of queries. If I had an 18
million row table to manage, I'd make sure I knew about every query that
accessed it and do everything possible to avoid scanning and sorting it.

Indexes alone may not resolve this. The optimizer may still choose to
scan the large table. If this happens and you know that you are
retrieving a small number of rows, an OPTIMIZE FOR ONE ROWS clause in
the query may help avoid the scan by influencing the optimizer to choose
a nested loop join.
Phil Sherman

James wrote:
I have a performance problem with the following query and variations on the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before the
NOT logic?

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

or
select a.COMPUTER_ALIAS from inv4.computer a,
inv4.active_ad_systems b
where a.computer_alias = b.computer_name and a.computer_sys_id NOT IN
(select distinct computer_sys_id from inv4.matched_sware)

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and not exists ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

Nov 12 '05 #3

P: n/a
Thanks for the reply

my statement about no index on matched_sware was incorrect, there is pk
index on the 2 fields in the table. I learned that active_ad_systems is a
view from another database. I moved the query to that db. which had tables
loaded with a copy of the same data as the federated views I originally
queried and was able to get a return in about 15 minutes.
"Knut Stolze" <st****@de.ibm.com> wrote in message
news:df**********@lc03.rz.uni-jena.de...
James wrote:
I have a performance problem with the following query and variations on
the
subselect. The EXISTS version of the first example will complete in
~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so
far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before
the
NOT logic?


The idea of SQL is that you tell the system _what_ you want, and the
system
figures out the best way _how_ to get the result. So rephrasing queries
often doesn't help at all because the same internal, optimized plan might
result.

In your case, I would rather have a look at the access plan for the query
and see if appropriate indexes were defined and if statistics were
collected. Additionally, check you general system configuration. (You
did
run the Configuration Advisor?)
select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)


You don't need the DISTINCT keyword in the subselect. If there is any row
to be returned at allin the distinct set produced by the subselect, there
must also be a row in the multi-set (w/o distinct) and vice versa.

Personally, I would write the query like shown below because I like it to
have a clear structure showing what is selected from which tables. You
had
a join in the FROM clause, but the 2nd table was'nt used in the SELECT
list, which I find rather confusing.

SELECT a.computer_alias
FROM inv4.computer AS a
WHERE EXISTS ( SELECT 1
FROM inv4.active_ad_systems AS b
WHERE a.computer_alias = b.computer_name ) AND
NOT EXISTS ( SELECT 1
FROM inv4.matched_sware AS c
WHERE a.computer_sys_id = c.computer_sys_id )

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #4

P: n/a
I appreciate the reply, matched_sware did turn out to have an pk index on
the 2 fields in the table. As I replied to Knut, I solved my performance
issue in this case by moving to a single db copy of the data, from the
federated view of the first try.

"Phil Sherman" <ps******@ameritech.net> wrote in message
news:Cs***************@newssvr22.news.prodigy.net. ..
I doubt your problem is the NOT EXISTS clause. From your description; you
have a table, matched_sware, that contains 18,000,000 rows and is not
indexed. Your query requires "matching" information from this table to
multiple rows from another source. Without an appropriate index, the
optimizer should choose to scan and sort 18 million COMPUTER_SYS_IDs from
matched_sware. This is most likely the cause of the lengthy run time. I'd
also expect a significant impact on other database performance while this
is running.

You didn't state anything about the indexing on active_ad_systems either.

Using the table names only, I'd guess that you should have at least two
indexes on matched_sware. One should have CONPUTER_SYS_ID as its first
column, the other whatever you use to indicate a unique "sware". Depending
on the data needed from matched_sware; you may want to include additional
columns in the indexes to prompte index-only retrievals. Additional
columns in the index is a tradeoff of disk space for a decrease in I/O by
avoiding an additional retrieval of table data. Another question to
examine is the business meaning of duplicate rows in matched_sware. Access
plans, generated by EXPLAIN, as Knut suggested, are the first step to
improve performance of queries. If I had an 18 million row table to
manage, I'd make sure I knew about every query that accessed it and do
everything possible to avoid scanning and sorting it.

Indexes alone may not resolve this. The optimizer may still choose to scan
the large table. If this happens and you know that you are retrieving a
small number of rows, an OPTIMIZE FOR ONE ROWS clause in the query may
help avoid the scan by influencing the optimizer to choose a nested loop
join.
Phil Sherman

James wrote:
I have a performance problem with the following query and variations on
the subselect. The EXISTS version of the first example will complete
in ~10 minutes. The NOT logic in both the examples makes them both keep
running long enough that a communications error is the only result
returned so far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before
the NOT logic?

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

or
select a.COMPUTER_ALIAS from inv4.computer a,
inv4.active_ad_systems b
where a.computer_alias = b.computer_name and a.computer_sys_id NOT IN
(select distinct computer_sys_id from inv4.matched_sware)

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and not exists ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)


Nov 12 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.