I have a performance problem with the following query and variations on the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so far.
This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.
Any suggestions on a technique to get the subselect to resolve before the
NOT logic?
select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID)
or
select a.COMPUTER_ALIA S from inv4.computer a,
inv4.active_ad_ systems b
where a.computer_alia s = b.computer_name and a.computer_sys_ id NOT IN
(select distinct computer_sys_id from inv4.matched_sw are)
select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems
b where a.computer_alia s = b.computer_name and not exists ( select
distinct c.computer_sys_ id from inv4.matched_sw are as c where
a.COMPUTER_SYS_ ID
= c.COMPUTER_SYS_ ID) 4 2749
James wrote: I have a performance problem with the following query and variations on the subselect. The EXISTS version of the first example will complete in ~10 minutes. The NOT logic in both the examples makes them both keep running long enough that a communications error is the only result returned so far.
This is a federated view computer ~160 k rows, computer_sys_id is PK . matched_sware ~ 18 million no PK , no index.
Any suggestions on a technique to get the subselect to resolve before the NOT logic?
The idea of SQL is that you tell the system _what_ you want, and the system
figures out the best way _how_ to get the result. So rephrasing queries
often doesn't help at all because the same internal, optimized plan might
result.
In your case, I would rather have a look at the access plan for the query
and see if appropriate indexes were defined and if statistics were
collected. Additionally, check you general system configuration. (You did
run the Configuration Advisor?)
select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems b where a.computer_alia s = b.computer_name and NOT EXISTS ( select distinct c.computer_sys_ id from inv4.matched_sw are as c where a.COMPUTER_SYS_ ID = c.COMPUTER_SYS_ ID)
You don't need the DISTINCT keyword in the subselect. If there is any row
to be returned at allin the distinct set produced by the subselect, there
must also be a row in the multi-set (w/o distinct) and vice versa.
Personally, I would write the query like shown below because I like it to
have a clear structure showing what is selected from which tables. You had
a join in the FROM clause, but the 2nd table was'nt used in the SELECT
list, which I find rather confusing.
SELECT a.computer_alia s
FROM inv4.computer AS a
WHERE EXISTS ( SELECT 1
FROM inv4.active_ad_ systems AS b
WHERE a.computer_alia s = b.computer_name ) AND
NOT EXISTS ( SELECT 1
FROM inv4.matched_sw are AS c
WHERE a.computer_sys_ id = c.computer_sys_ id )
--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena
I doubt your problem is the NOT EXISTS clause. From your description;
you have a table, matched_sware, that contains 18,000,000 rows and is
not indexed. Your query requires "matching" information from this table
to multiple rows from another source. Without an appropriate index, the
optimizer should choose to scan and sort 18 million COMPUTER_SYS_ID s
from matched_sware. This is most likely the cause of the lengthy run
time. I'd also expect a significant impact on other database performance
while this is running.
You didn't state anything about the indexing on active_ad_syste ms either.
Using the table names only, I'd guess that you should have at least two
indexes on matched_sware. One should have CONPUTER_SYS_ID as its first
column, the other whatever you use to indicate a unique "sware".
Depending on the data needed from matched_sware; you may want to include
additional columns in the indexes to prompte index-only retrievals.
Additional columns in the index is a tradeoff of disk space for a
decrease in I/O by avoiding an additional retrieval of table data.
Another question to examine is the business meaning of duplicate rows in
matched_sware. Access plans, generated by EXPLAIN, as Knut suggested,
are the first step to improve performance of queries. If I had an 18
million row table to manage, I'd make sure I knew about every query that
accessed it and do everything possible to avoid scanning and sorting it.
Indexes alone may not resolve this. The optimizer may still choose to
scan the large table. If this happens and you know that you are
retrieving a small number of rows, an OPTIMIZE FOR ONE ROWS clause in
the query may help avoid the scan by influencing the optimizer to choose
a nested loop join.
Phil Sherman
James wrote: I have a performance problem with the following query and variations on the subselect. The EXISTS version of the first example will complete in ~10 minutes. The NOT logic in both the examples makes them both keep running long enough that a communications error is the only result returned so far.
This is a federated view computer ~160 k rows, computer_sys_id is PK . matched_sware ~ 18 million no PK , no index.
Any suggestions on a technique to get the subselect to resolve before the NOT logic?
select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems b where a.computer_alia s = b.computer_name and NOT EXISTS ( select distinct c.computer_sys_ id from inv4.matched_sw are as c where a.COMPUTER_SYS_ ID = c.COMPUTER_SYS_ ID)
or select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems b where a.computer_alia s = b.computer_name and a.computer_sys_ id NOT IN (select distinct computer_sys_id from inv4.matched_sw are) select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems b where a.computer_alia s = b.computer_name and not exists ( select distinct c.computer_sys_ id from inv4.matched_sw are as c where a.COMPUTER_SYS_ ID = c.COMPUTER_SYS_ ID)
Thanks for the reply
my statement about no index on matched_sware was incorrect, there is pk
index on the 2 fields in the table. I learned that active_ad_syste ms is a
view from another database. I moved the query to that db. which had tables
loaded with a copy of the same data as the federated views I originally
queried and was able to get a return in about 15 minutes.
"Knut Stolze" <st****@de.ibm. com> wrote in message
news:df******** **@lc03.rz.uni-jena.de... James wrote:
I have a performance problem with the following query and variations on the subselect. The EXISTS version of the first example will complete in ~10 minutes. The NOT logic in both the examples makes them both keep running long enough that a communications error is the only result returned so far.
This is a federated view computer ~160 k rows, computer_sys_id is PK . matched_sware ~ 18 million no PK , no index.
Any suggestions on a technique to get the subselect to resolve before the NOT logic?
The idea of SQL is that you tell the system _what_ you want, and the system figures out the best way _how_ to get the result. So rephrasing queries often doesn't help at all because the same internal, optimized plan might result.
In your case, I would rather have a look at the access plan for the query and see if appropriate indexes were defined and if statistics were collected. Additionally, check you general system configuration. (You did run the Configuration Advisor?)
select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems b where a.computer_alia s = b.computer_name and NOT EXISTS ( select distinct c.computer_sys_ id from inv4.matched_sw are as c where a.COMPUTER_SYS_ ID = c.COMPUTER_SYS_ ID)
You don't need the DISTINCT keyword in the subselect. If there is any row to be returned at allin the distinct set produced by the subselect, there must also be a row in the multi-set (w/o distinct) and vice versa.
Personally, I would write the query like shown below because I like it to have a clear structure showing what is selected from which tables. You had a join in the FROM clause, but the 2nd table was'nt used in the SELECT list, which I find rather confusing.
SELECT a.computer_alia s FROM inv4.computer AS a WHERE EXISTS ( SELECT 1 FROM inv4.active_ad_ systems AS b WHERE a.computer_alia s = b.computer_name ) AND NOT EXISTS ( SELECT 1 FROM inv4.matched_sw are AS c WHERE a.computer_sys_ id = c.computer_sys_ id )
-- Knut Stolze Information Integration Development IBM Germany / University of Jena
I appreciate the reply, matched_sware did turn out to have an pk index on
the 2 fields in the table. As I replied to Knut, I solved my performance
issue in this case by moving to a single db copy of the data, from the
federated view of the first try.
"Phil Sherman" <ps******@ameri tech.net> wrote in message
news:Cs******** *******@newssvr 22.news.prodigy .net... I doubt your problem is the NOT EXISTS clause. From your description; you have a table, matched_sware, that contains 18,000,000 rows and is not indexed. Your query requires "matching" information from this table to multiple rows from another source. Without an appropriate index, the optimizer should choose to scan and sort 18 million COMPUTER_SYS_ID s from matched_swar e. This is most likely the cause of the lengthy run time. I'd also expect a significant impact on other database performance while this is running.
You didn't state anything about the indexing on active_ad_syste ms either.
Using the table names only, I'd guess that you should have at least two indexes on matched_sware. One should have CONPUTER_SYS_ID as its first column, the other whatever you use to indicate a unique "sware". Depending on the data needed from matched_sware; you may want to include additional columns in the indexes to prompte index-only retrievals. Additional columns in the index is a tradeoff of disk space for a decrease in I/O by avoiding an additional retrieval of table data. Another question to examine is the business meaning of duplicate rows in matched_sware. Access plans, generated by EXPLAIN, as Knut suggested, are the first step to improve performance of queries. If I had an 18 million row table to manage, I'd make sure I knew about every query that accessed it and do everything possible to avoid scanning and sorting it.
Indexes alone may not resolve this. The optimizer may still choose to scan the large table. If this happens and you know that you are retrieving a small number of rows, an OPTIMIZE FOR ONE ROWS clause in the query may help avoid the scan by influencing the optimizer to choose a nested loop join.
Phil Sherman James wrote: I have a performance problem with the following query and variations on the subselect. The EXISTS version of the first example will complete in ~10 minutes. The NOT logic in both the examples makes them both keep running long enough that a communications error is the only result returned so far.
This is a federated view computer ~160 k rows, computer_sys_id is PK . matched_sware ~ 18 million no PK , no index.
Any suggestions on a technique to get the subselect to resolve before the NOT logic?
select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems b where a.computer_alia s = b.computer_name and NOT EXISTS ( select distinct c.computer_sys_ id from inv4.matched_sw are as c where a.COMPUTER_SYS_ ID = c.COMPUTER_SYS_ ID)
or select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems b where a.computer_alia s = b.computer_name and a.computer_sys_ id NOT IN (select distinct computer_sys_id from inv4.matched_sw are) select a.COMPUTER_ALIA S from inv4.computer a, inv4.active_ad_ systems b where a.computer_alia s = b.computer_name and not exists ( select distinct c.computer_sys_ id from inv4.matched_sw are as c where a.COMPUTER_SYS_ ID = c.COMPUTER_SYS_ ID)
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: limbert |
last post by:
------=_NextPart_000_0001_01C34D7B.DFBF53C0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Hi,
=20
I was reading the manual and it said that the subselect is only
|
by: Greg Stark |
last post by:
So I have a query in which some of the select values are subqueries. The
subqueries are aggregates so I don't want to turn this into a join, it would
become too complex and postgres would have trouble optimizing things.
So my question is, is there some way to have a subselect return multiple
columns and break those out in the outer query?
Something like:
SELECT x,y,z,
|
by: Neil Zanella |
last post by:
Hello,
I would like to ask the about the following...
PostgreSQL allows tables resulting from subselects to be renamed with
an optional AS keyword whereas Oracle 9 will report an error whenever
a table is renamed with the AS keyword. Furthermore, in PostgreSQL
when the result of a subselect is referenced in an outer select
it is required that the subselect result be named, whereas this
is not true in Oracle. I wonder what standard SQL...
|
by: dtwalter |
last post by:
Is it possible to ORDER BY a SubSelect? I don't see why it wouldn't
be, but I'm having some trouble. Hopefully it's just a simple error in
syntax and somebody can tell me. Here's what I'm trying to do...
I've got two tables:
Table1:
TestData Filename
Bird FileB
Blue FileA
Circle FileC
|
by: Marco Lazzeri |
last post by:
I'd like to reference values returned by a subselect in the same SELECT
query.
Example:
SELECT id,
( SELECT COUNT(*) FROM second ) AS value_to_reference,
( value_to_reference + 1 ) AS value_that_uses_referenced_one
FROM first;
| |
by: Chris Kratz |
last post by:
Hello all,
We have run into what appears to be a problem with rules and subselects in
postgres 7.4.1. We have boiled it down to the following test case. If
anyone has any thoughts as to why this would be happening, we would
appreciate feedback.
We have tested on 7.3.4, 7.3.6 and 7.4.1 and all exhibit the same behavior.
Test case one tries to populate table2 from table1 with records that are not
in table2 already. Table2 gets...
|
by: Alex P |
last post by:
Hi,
when creating a query with a subselect
SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name) AS max_pop
FROM states;
then it is not possible to sort after max_pop or use max_pop in a function or a CASE.
am I dont anything wrong or is this meant to be the case?
|
by: Morten K. Poulsen |
last post by:
(re-post)
Dear list,
Please let me know if this is not the list to ask this kind of question.
I am trying to optimize a query that joins two relatively large (750000 rows in
each) tables. If I do it using a subselect, I can "force" the planner to choose
the fastest path. Now, my question is:
|
by: Sebastien |
last post by:
I have the following statement which I run successfully in... 1 hour 10
minutes.
SELECT
a.tsgicd as ACCT_ID,
a.tsa5cd as SEC_ID,
CASE
WHEN (SUBSTRING(a.tsgicd, 6, 1) = 'R'
or SUBSTRING(a.tsgicd, 6, 1) = 'Y'
or SUBSTRING(a.tsgicd, 6, 1) = '0'
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |