I doubt your problem is the NOT EXISTS clause. From your description;
you have a table, matched_sware, that contains 18,000,000 rows and is
not indexed. Your query requires "matching" information from this table
to multiple rows from another source. Without an appropriate index, the
optimizer should choose to scan and sort 18 million COMPUTER_SYS_IDs
from matched_sware. This is most likely the cause of the lengthy run
time. I'd also expect a significant impact on other database performance
while this is running.
You didn't state anything about the indexing on active_ad_systems either.
Using the table names only, I'd guess that you should have at least two
indexes on matched_sware. One should have CONPUTER_SYS_ID as its first
column, the other whatever you use to indicate a unique "sware".
Depending on the data needed from matched_sware; you may want to include
additional columns in the indexes to prompte index-only retrievals.
Additional columns in the index is a tradeoff of disk space for a
decrease in I/O by avoiding an additional retrieval of table data.
Another question to examine is the business meaning of duplicate rows in
matched_sware. Access plans, generated by EXPLAIN, as Knut suggested,
are the first step to improve performance of queries. If I had an 18
million row table to manage, I'd make sure I knew about every query that
accessed it and do everything possible to avoid scanning and sorting it.
Indexes alone may not resolve this. The optimizer may still choose to
scan the large table. If this happens and you know that you are
retrieving a small number of rows, an OPTIMIZE FOR ONE ROWS clause in
the query may help avoid the scan by influencing the optimizer to choose
a nested loop join.
Phil Sherman
James wrote:
I have a performance problem with the following query and variations on the
subselect. The EXISTS version of the first example will complete in ~10
minutes. The NOT logic in both the examples makes them both keep running
long enough that a communications error is the only result returned so far.
This is a federated view computer ~160 k rows, computer_sys_id is PK .
matched_sware ~ 18 million no PK , no index.
Any suggestions on a technique to get the subselect to resolve before the
NOT logic?
select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and NOT EXISTS ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)
or
select a.COMPUTER_ALIAS from inv4.computer a,
inv4.active_ad_systems b
where a.computer_alias = b.computer_name and a.computer_sys_id NOT IN
(select distinct computer_sys_id from inv4.matched_sware)
select a.COMPUTER_ALIAS from inv4.computer a, inv4.active_ad_systems
b where a.computer_alias = b.computer_name and not exists ( select
distinct c.computer_sys_id from inv4.matched_sware as c where
a.COMPUTER_SYS_ID
= c.COMPUTER_SYS_ID)