By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,610 Members | 1,677 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,610 IT Pros & Developers. It's quick & easy.

Equivalence of join and exists

P: n/a
Hello all,

We are seeing strange behaviour for queries on a table we need to
convert data from. We try to find out whether table A(B_CONV_ID)
contains data that does not exists in table B(CONV_ID). We expect that
there is data missing in table B, thus finding a positive number of
records.

There are two tables, A and B.

A: ID, ..., B_CONV_ID
B: ID, ..., CONV_ID

The queries we are trying to execute and their results are the
following:

select count(*)
from A as AA
join B as BB on BB.CONV_ID = AA.B_CONV_ID
where BB.CONV_ID is null;

Result = 0;

select count(*)
from A as AA
where not exists (select 1 from B as BB where BB.CONV_ID =
AA.B_CONV_ID);

Result = 6273

A colleague confirmed me that the two queries were supposed to
retrieve the same result. Can anyone give us a clue of why the result
can be this different.
Nov 20 '07 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On Nov 20, 3:50 pm, frederikenge...@gmail.com wrote:
Hello all,

We are seeing strange behaviour for queries on a table we need to
convert data from. We try to find out whether table A(B_CONV_ID)
contains data that does not exists in table B(CONV_ID). We expect that
there is data missing in table B, thus finding a positive number of
records.

There are two tables, A and B.

A: ID, ..., B_CONV_ID
B: ID, ..., CONV_ID

The queries we are trying to execute and their results are the
following:

select count(*)
from A as AA
join B as BB on BB.CONV_ID = AA.B_CONV_ID
where BB.CONV_ID is null;

Result = 0;

select count(*)
from A as AA
where not exists (select 1 from B as BB where BB.CONV_ID =
AA.B_CONV_ID);

Result = 6273

A colleague confirmed me that the two queries were supposed to
retrieve the same result. Can anyone give us a clue of why the result
can be this different.
You need to do an outer join. I assume A join B is the same A inner
join B, and in that case:

from A as AA join B as BB
on BB.CONV_ID = AA.B_CONV_ID
where BB.CONV_ID is null;

is a contradiction. Try:

select count(*)
from A as AA
left outer join B as BB
on BB.CONV_ID = AA.B_CONV_ID
where BB.CONV_ID is null;

/Lennart
Nov 20 '07 #2

P: n/a
fr*************@gmail.com wrote:
Hello all,

We are seeing strange behaviour for queries on a table we need to
convert data from. We try to find out whether table A(B_CONV_ID)
contains data that does not exists in table B(CONV_ID). We expect that
there is data missing in table B, thus finding a positive number of
records.

There are two tables, A and B.

A: ID, ..., B_CONV_ID
B: ID, ..., CONV_ID

The queries we are trying to execute and their results are the
following:

select count(*)
from A as AA
join B as BB on BB.CONV_ID = AA.B_CONV_ID
where BB.CONV_ID is null;

Result = 0;

select count(*)
from A as AA
where not exists (select 1 from B as BB where BB.CONV_ID =
AA.B_CONV_ID);

Result = 6273

A colleague confirmed me that the two queries were supposed to
retrieve the same result. Can anyone give us a clue of why the result
can be this different.
The queries do not have the same semantics, especially if you consider
NULLs. What happens is this:

The join predicate of the first query will evaluate to "unknown" if either
BB.CONV_ID or AA.B_CONV_ID happen to be NULL. "unknown" is interpreted
like "false", so all joined rows where either of the attributes is NULL
will be discarded. Adding "WHERE bb.conv_id IS NULL" doesn't change
anything at all because the relevant rows where already discarded before.

In the second query, you have pretty much the same. Your subselect does not
identify any qualifying rows in table B. But now you have a NOT EXISTS,
which leads to a "true" result for the predicate, and that qualifies the
row in table A.
What you can do to fix that is to correct the join condition in query 1 like
this:

ON ( bb.conv_id = aa.b_conv_id OR
bb.conv_id IS NULL OR
aa.b_conv_id IS NULL )

or maybe this, depending on what you want to achieve:

ON ( bb.conv_id = aa.b_conv_id OR
( bb.conv_id IS NULL AND
aa.b_conv_id IS NULL ) )

(You wouldn't need any of the parenthesis, but it makes things much
clearer.)

Alternatively, you can add "OR bb.conv_id IS NULL" to the WHERE clause in
the subselect of the 2nd query.

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany
Nov 20 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.