473,748 Members | 9,933 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

not in subselect

I have a performance problem with the following query and variations on the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before the
NOT logic?

select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID)

or
select a.COMPUTER_ALIA S from inv4.computer a,
inv4.active_ad_ systems b
where a.computer_alia s = b.computer_name and a.computer_sys_ id NOT IN
(select distinct computer_sys_id from inv4.matched_sw are)

select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and not exists ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID)

Nov 12 '05 #1
4 2749
James wrote:
I have a performance problem with the following query and variations on
the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so
far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before the
NOT logic?
The idea of SQL is that you tell the system _what_ you want, and the system
figures out the best way _how_ to get the result. So rephrasing queries
often doesn't help at all because the same internal, optimized plan might
result.

In your case, I would rather have a look at the access plan for the query
and see if appropriate indexes were defined and if statistics were
collected. Additionally, check you general system configuration. (You did
run the Configuration Advisor?)
select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID)


You don't need the DISTINCT keyword in the subselect. If there is any row
to be returned at allin the distinct set produced by the subselect, there
must also be a row in the multi-set (w/o distinct) and vice versa.

Personally, I would write the query like shown below because I like it to
have a clear structure showing what is selected from which tables. You had
a join in the FROM clause, but the 2nd table was'nt used in the SELECT
list, which I find rather confusing.

SELECT a.computer_alia s
FROM inv4.computer AS a
WHERE EXISTS ( SELECT 1
FROM inv4.active_ad_ systems AS b
WHERE a.computer_alia s = b.computer_name ) AND
NOT EXISTS ( SELECT 1
FROM inv4.matched_sw are AS c
WHERE a.computer_sys_ id = c.computer_sys_ id )

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena
Nov 12 '05 #2
I doubt your problem is the NOT EXISTS clause. From your description;
you have a table, matched_sware, that contains 18,000,000 rows and is
not indexed. Your query requires "matching" information from this table
to multiple rows from another source. Without an appropriate index, the
optimizer should choose to scan and sort 18 million COMPUTER_SYS_ID s
from matched_sware. This is most likely the cause of the lengthy run
time. I'd also expect a significant impact on other database performance
while this is running.

You didn't state anything about the indexing on active_ad_syste ms either.

Using the table names only, I'd guess that you should have at least two
indexes on matched_sware. One should have CONPUTER_SYS_ID as its first
column, the other whatever you use to indicate a unique "sware".
Depending on the data needed from matched_sware; you may want to include
additional columns in the indexes to prompte index-only retrievals.
Additional columns in the index is a tradeoff of disk space for a
decrease in I/O by avoiding an additional retrieval of table data.
Another question to examine is the business meaning of duplicate rows in
matched_sware. Access plans, generated by EXPLAIN, as Knut suggested,
are the first step to improve performance of queries. If I had an 18
million row table to manage, I'd make sure I knew about every query that
accessed it and do everything possible to avoid scanning and sorting it.

Indexes alone may not resolve this. The optimizer may still choose to
scan the large table. If this happens and you know that you are
retrieving a small number of rows, an OPTIMIZE FOR ONE ROWS clause in
the query may help avoid the scan by influencing the optimizer to choose
a nested loop join.
Phil Sherman

James wrote:
I have a performance problem with the following query and variations on the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before the
NOT logic?

select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID)

or
select a.COMPUTER_ALIA S from inv4.computer a,
inv4.active_ad_ systems b
where a.computer_alia s = b.computer_name and a.computer_sys_ id NOT IN
(select distinct computer_sys_id from inv4.matched_sw are)

select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and not exists ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID)

Nov 12 '05 #3
Thanks for the reply

my statement about no index on matched_sware was incorrect, there is pk
index on the 2 fields in the table. I learned that active_ad_syste ms is a
view from another database. I moved the query to that db. which had tables
loaded with a copy of the same data as the federated views I originally
queried and was able to get a return in about 15 minutes.
"Knut Stolze" <st****@de.ibm. com> wrote in message
news:df******** **@lc03.rz.uni-jena.de...
James wrote:
I have a performance problem with the following query and variations on
the
subselect. The EXISTS version of the first example will complete in
~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so
far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before
the
NOT logic?


The idea of SQL is that you tell the system _what_ you want, and the
system
figures out the best way _how_ to get the result. So rephrasing queries
often doesn't help at all because the same internal, optimized plan might
result.

In your case, I would rather have a look at the access plan for the query
and see if appropriate indexes were defined and if statistics were
collected. Additionally, check you general system configuration. (You
did
run the Configuration Advisor?)
select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID)


You don't need the DISTINCT keyword in the subselect. If there is any row
to be returned at allin the distinct set produced by the subselect, there
must also be a row in the multi-set (w/o distinct) and vice versa.

Personally, I would write the query like shown below because I like it to
have a clear structure showing what is selected from which tables. You
had
a join in the FROM clause, but the 2nd table was'nt used in the SELECT
list, which I find rather confusing.

SELECT a.computer_alia s
FROM inv4.computer AS a
WHERE EXISTS ( SELECT 1
FROM inv4.active_ad_ systems AS b
WHERE a.computer_alia s = b.computer_name ) AND
NOT EXISTS ( SELECT 1
FROM inv4.matched_sw are AS c
WHERE a.computer_sys_ id = c.computer_sys_ id )

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #4
I appreciate the reply, matched_sware did turn out to have an pk index on
the 2 fields in the table. As I replied to Knut, I solved my performance
issue in this case by moving to a single db copy of the data, from the
federated view of the first try.

"Phil Sherman" <ps******@ameri tech.net> wrote in message
news:Cs******** *******@newssvr 22.news.prodigy .net...
I doubt your problem is the NOT EXISTS clause. From your description; you
have a table, matched_sware, that contains 18,000,000 rows and is not
indexed. Your query requires "matching" information from this table to
multiple rows from another source. Without an appropriate index, the
optimizer should choose to scan and sort 18 million COMPUTER_SYS_ID s from
matched_swar e. This is most likely the cause of the lengthy run time. I'd
also expect a significant impact on other database performance while this
is running.

You didn't state anything about the indexing on active_ad_syste ms either.

Using the table names only, I'd guess that you should have at least two
indexes on matched_sware. One should have CONPUTER_SYS_ID as its first
column, the other whatever you use to indicate a unique "sware". Depending
on the data needed from matched_sware; you may want to include additional
columns in the indexes to prompte index-only retrievals. Additional
columns in the index is a tradeoff of disk space for a decrease in I/O by
avoiding an additional retrieval of table data. Another question to
examine is the business meaning of duplicate rows in matched_sware. Access
plans, generated by EXPLAIN, as Knut suggested, are the first step to
improve performance of queries. If I had an 18 million row table to
manage, I'd make sure I knew about every query that accessed it and do
everything possible to avoid scanning and sorting it.

Indexes alone may not resolve this. The optimizer may still choose to scan
the large table. If this happens and you know that you are retrieving a
small number of rows, an OPTIMIZE FOR ONE ROWS clause in the query may
help avoid the scan by influencing the optimizer to choose a nested loop
join.
Phil Sherman

James wrote:
I have a performance problem with the following query and variations on
the subselect. The EXISTS version of the first example will complete
in ~10 minutes. The NOT logic in both the examples makes them both keep
running long enough that a communications error is the only result
returned so far.

This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.

Any suggestions on a technique to get the subselect to resolve before
the NOT logic?

select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID)

or
select a.COMPUTER_ALIA S from inv4.computer a,
inv4.active_ad_ systems b
where a.computer_alia s = b.computer_name and a.computer_sys_ id NOT IN
(select distinct computer_sys_id from inv4.matched_sw are)

select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and not exists ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID)


Nov 12 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1748
by: limbert | last post by:
------=_NextPart_000_0001_01C34D7B.DFBF53C0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, =20 I was reading the manual and it said that the subselect is only
6
16329
by: Greg Stark | last post by:
So I have a query in which some of the select values are subqueries. The subqueries are aggregates so I don't want to turn this into a join, it would become too complex and postgres would have trouble optimizing things. So my question is, is there some way to have a subselect return multiple columns and break those out in the outer query? Something like: SELECT x,y,z,
3
7202
by: Neil Zanella | last post by:
Hello, I would like to ask the about the following... PostgreSQL allows tables resulting from subselects to be renamed with an optional AS keyword whereas Oracle 9 will report an error whenever a table is renamed with the AS keyword. Furthermore, in PostgreSQL when the result of a subselect is referenced in an outer select it is required that the subselect result be named, whereas this is not true in Oracle. I wonder what standard SQL...
4
7562
by: dtwalter | last post by:
Is it possible to ORDER BY a SubSelect? I don't see why it wouldn't be, but I'm having some trouble. Hopefully it's just a simple error in syntax and somebody can tell me. Here's what I'm trying to do... I've got two tables: Table1: TestData Filename Bird FileB Blue FileA Circle FileC
1
1725
by: Marco Lazzeri | last post by:
I'd like to reference values returned by a subselect in the same SELECT query. Example: SELECT id, ( SELECT COUNT(*) FROM second ) AS value_to_reference, ( value_to_reference + 1 ) AS value_that_uses_referenced_one FROM first;
4
5485
by: Chris Kratz | last post by:
Hello all, We have run into what appears to be a problem with rules and subselects in postgres 7.4.1. We have boiled it down to the following test case. If anyone has any thoughts as to why this would be happening, we would appreciate feedback. We have tested on 7.3.4, 7.3.6 and 7.4.1 and all exhibit the same behavior. Test case one tries to populate table2 from table1 with records that are not in table2 already. Table2 gets...
6
3223
by: Alex P | last post by:
Hi, when creating a query with a subselect SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name) AS max_pop FROM states; then it is not possible to sort after max_pop or use max_pop in a function or a CASE. am I dont anything wrong or is this meant to be the case?
2
7356
by: Morten K. Poulsen | last post by:
(re-post) Dear list, Please let me know if this is not the list to ask this kind of question. I am trying to optimize a query that joins two relatively large (750000 rows in each) tables. If I do it using a subselect, I can "force" the planner to choose the fastest path. Now, my question is:
6
5016
by: Sebastien | last post by:
I have the following statement which I run successfully in... 1 hour 10 minutes. SELECT a.tsgicd as ACCT_ID, a.tsa5cd as SEC_ID, CASE WHEN (SUBSTRING(a.tsgicd, 6, 1) = 'R' or SUBSTRING(a.tsgicd, 6, 1) = 'Y' or SUBSTRING(a.tsgicd, 6, 1) = '0'
0
8991
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8831
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9548
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9249
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8244
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6796
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6076
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4876
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2215
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.