473,326 Members | 2,127 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

not in subselect

I have a performance problem with the following query and variations on the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before the
NOT logic?

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

or
select a.COMPUTER_ALIAS from inv4.computer a,
inv4.active_ad_systems b
where a.computer_alias = b.computer_name and a.computer_sys_id NOT IN
(select distinct computer_sys_id from inv4.matched_sware)

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and not exists ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

Nov 12 '05 #1
4 2719
James wrote:
I have a performance problem with the following query and variations on
the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so
far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before the
NOT logic?
The idea of SQL is that you tell the system _what_ you want, and the system
figures out the best way _how_ to get the result. So rephrasing queries
often doesn't help at all because the same internal, optimized plan might
result.

In your case, I would rather have a look at the access plan for the query
and see if appropriate indexes were defined and if statistics were
collected. Additionally, check you general system configuration. (You did
run the Configuration Advisor?)
select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)


You don't need the DISTINCT keyword in the subselect. If there is any row
to be returned at allin the distinct set produced by the subselect, there
must also be a row in the multi-set (w/o distinct) and vice versa.

Personally, I would write the query like shown below because I like it to
have a clear structure showing what is selected from which tables. You had
a join in the FROM clause, but the 2nd table was'nt used in the SELECT
list, which I find rather confusing.

SELECT a.computer_alias
FROM inv4.computer AS a
WHERE EXISTS ( SELECT 1
FROM inv4.active_ad_systems AS b
WHERE a.computer_alias = b.computer_name ) AND
NOT EXISTS ( SELECT 1
FROM inv4.matched_sware AS c
WHERE a.computer_sys_id = c.computer_sys_id )

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena
Nov 12 '05 #2
I doubt your problem is the NOT EXISTS clause. From your description;
you have a table, matched_sware, that contains 18,000,000 rows and is
not indexed. Your query requires "matching" information from this table
to multiple rows from another source. Without an appropriate index, the
optimizer should choose to scan and sort 18 million COMPUTER_SYS_IDs
from matched_sware. This is most likely the cause of the lengthy run
time. I'd also expect a significant impact on other database performance
while this is running.

You didn't state anything about the indexing on active_ad_systems either.

Using the table names only, I'd guess that you should have at least two
indexes on matched_sware. One should have CONPUTER_SYS_ID as its first
column, the other whatever you use to indicate a unique "sware".
Depending on the data needed from matched_sware; you may want to include
additional columns in the indexes to prompte index-only retrievals.
Additional columns in the index is a tradeoff of disk space for a
decrease in I/O by avoiding an additional retrieval of table data.
Another question to examine is the business meaning of duplicate rows in
matched_sware. Access plans, generated by EXPLAIN, as Knut suggested,
are the first step to improve performance of queries. If I had an 18
million row table to manage, I'd make sure I knew about every query that
accessed it and do everything possible to avoid scanning and sorting it.

Indexes alone may not resolve this. The optimizer may still choose to
scan the large table. If this happens and you know that you are
retrieving a small number of rows, an OPTIMIZE FOR ONE ROWS clause in
the query may help avoid the scan by influencing the optimizer to choose
a nested loop join.
Phil Sherman

James wrote:
I have a performance problem with the following query and variations on the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before the
NOT logic?

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

or
select a.COMPUTER_ALIAS from inv4.computer a,
inv4.active_ad_systems b
where a.computer_alias = b.computer_name and a.computer_sys_id NOT IN
(select distinct computer_sys_id from inv4.matched_sware)

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and not exists ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

Nov 12 '05 #3
Thanks for the reply

my statement about no index on matched_sware was incorrect, there is pk
index on the 2 fields in the table. I learned that active_ad_systems is a
view from another database. I moved the query to that db. which had tables
loaded with a copy of the same data as the federated views I originally
queried and was able to get a return in about 15 minutes.
"Knut Stolze" <st****@de.ibm.com> wrote in message
news:df**********@lc03.rz.uni-jena.de...
James wrote:
I have a performance problem with the following query and variations on
the
subselect. The EXISTS version of the first example will complete in
~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so
far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before
the
NOT logic?


The idea of SQL is that you tell the system _what_ you want, and the
system
figures out the best way _how_ to get the result. So rephrasing queries
often doesn't help at all because the same internal, optimized plan might
result.

In your case, I would rather have a look at the access plan for the query
and see if appropriate indexes were defined and if statistics were
collected. Additionally, check you general system configuration. (You
did
run the Configuration Advisor?)
select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)


You don't need the DISTINCT keyword in the subselect. If there is any row
to be returned at allin the distinct set produced by the subselect, there
must also be a row in the multi-set (w/o distinct) and vice versa.

Personally, I would write the query like shown below because I like it to
have a clear structure showing what is selected from which tables. You
had
a join in the FROM clause, but the 2nd table was'nt used in the SELECT
list, which I find rather confusing.

SELECT a.computer_alias
FROM inv4.computer AS a
WHERE EXISTS ( SELECT 1
FROM inv4.active_ad_systems AS b
WHERE a.computer_alias = b.computer_name ) AND
NOT EXISTS ( SELECT 1
FROM inv4.matched_sware AS c
WHERE a.computer_sys_id = c.computer_sys_id )

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #4
I appreciate the reply, matched_sware did turn out to have an pk index on
the 2 fields in the table. As I replied to Knut, I solved my performance
issue in this case by moving to a single db copy of the data, from the
federated view of the first try.

"Phil Sherman" <ps******@ameritech.net> wrote in message
news:Cs***************@newssvr22.news.prodigy.net. ..
I doubt your problem is the NOT EXISTS clause. From your description; you
have a table, matched_sware, that contains 18,000,000 rows and is not
indexed. Your query requires "matching" information from this table to
multiple rows from another source. Without an appropriate index, the
optimizer should choose to scan and sort 18 million COMPUTER_SYS_IDs from
matched_sware. This is most likely the cause of the lengthy run time. I'd
also expect a significant impact on other database performance while this
is running.

You didn't state anything about the indexing on active_ad_systems either.

Using the table names only, I'd guess that you should have at least two
indexes on matched_sware. One should have CONPUTER_SYS_ID as its first
column, the other whatever you use to indicate a unique "sware". Depending
on the data needed from matched_sware; you may want to include additional
columns in the indexes to prompte index-only retrievals. Additional
columns in the index is a tradeoff of disk space for a decrease in I/O by
avoiding an additional retrieval of table data. Another question to
examine is the business meaning of duplicate rows in matched_sware. Access
plans, generated by EXPLAIN, as Knut suggested, are the first step to
improve performance of queries. If I had an 18 million row table to
manage, I'd make sure I knew about every query that accessed it and do
everything possible to avoid scanning and sorting it.

Indexes alone may not resolve this. The optimizer may still choose to scan
the large table. If this happens and you know that you are retrieving a
small number of rows, an OPTIMIZE FOR ONE ROWS clause in the query may
help avoid the scan by influencing the optimizer to choose a nested loop
join.
Phil Sherman

James wrote:
I have a performance problem with the following query and variations on
the subselect. The EXISTS version of the first example will complete
in ~10 minutes. The NOT logic in both the examples makes them both keep
running long enough that a communications error is the only result
returned so far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before
the NOT logic?

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)

or
select a.COMPUTER_ALIAS from inv4.computer a,
inv4.active_ad_systems b
where a.computer_alias = b.computer_name and a.computer_sys_id NOT IN
(select distinct computer_sys_id from inv4.matched_sware)

select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and not exists ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)


Nov 12 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: limbert | last post by:
------=_NextPart_000_0001_01C34D7B.DFBF53C0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, =20 I was reading the manual and it said that...
6
by: Greg Stark | last post by:
So I have a query in which some of the select values are subqueries. The subqueries are aggregates so I don't want to turn this into a join, it would become too complex and postgres would have...
3
by: Neil Zanella | last post by:
Hello, I would like to ask the about the following... PostgreSQL allows tables resulting from subselects to be renamed with an optional AS keyword whereas Oracle 9 will report an error...
4
by: dtwalter | last post by:
Is it possible to ORDER BY a SubSelect? I don't see why it wouldn't be, but I'm having some trouble. Hopefully it's just a simple error in syntax and somebody can tell me. Here's what I'm trying...
1
by: Marco Lazzeri | last post by:
I'd like to reference values returned by a subselect in the same SELECT query. Example: SELECT id, ( SELECT COUNT(*) FROM second ) AS value_to_reference, ( value_to_reference + 1 ) AS...
4
by: Chris Kratz | last post by:
Hello all, We have run into what appears to be a problem with rules and subselects in postgres 7.4.1. We have boiled it down to the following test case. If anyone has any thoughts as to why...
6
by: Alex P | last post by:
Hi, when creating a query with a subselect SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name) AS max_pop FROM states; then it is not possible to sort after max_pop...
2
by: Morten K. Poulsen | last post by:
(re-post) Dear list, Please let me know if this is not the list to ask this kind of question. I am trying to optimize a query that joins two relatively large (750000 rows in each) tables....
6
by: Sebastien | last post by:
I have the following statement which I run successfully in... 1 hour 10 minutes. SELECT a.tsgicd as ACCT_ID, a.tsa5cd as SEC_ID, CASE WHEN (SUBSTRING(a.tsgicd, 6, 1) = 'R' or...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.