470,616 Members | 2,032 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,616 developers. It's quick & easy.

I/O cost of an NLJOIN

Friends:

Consider the plan fragment below:

NLJOIN
( 17)
134251
79996.9
/---------+--------\
4804 0.770869
IXSCAN IXSCAN
( 18) ( 23)
520.887 37.8431
108 3
/---+---\ |
0 12170 9.30829e+06
NLJOIN INDEX: XXX INDEX: XXX
( 19) AAAAAAAAA BBBBBBBBB
0.018982
0
/---------+---------\
0 1
FETCH IXSCAN
( 20) ( 22)
0.018982 12.6422
0 1
/----+----\ |
0 0 36323
IXSCAN TABLE: XXX INDEX: XXX
( 21) CCCCCCCCCC DDDDDDDDD
0.0185594
0
|
0
INDEX: XXX
EEEEEEEEE

Indexes:

AAAAAAAAA - 108 leaf nodes, table cardinality is 12,170
BBBBBBBBB - 95,047 leaf nodes, table cardinality is 9,308,293
DDDDDDDDD - 182 leaf nodes, table cardinality is 36,323
EEEEEEEEE - No rows

I'm trying to reason out the cost of the NLJOIN above. Given what I
understand about such joins *generally*, there should be one (or more)
I/Os to the inner table for every I/O of the outer. In the above, there
are three I/Os for each of 108 outer I/Os (right?), so I'd expect the
I/O cost to be 324: 108*3. So, where do the ca. 80K I/Os come from? I'm
clearly missing something fundamental; please disabuse me of my
misunderstanding. (Please also let me know if you need more information
about the underlying indexes or the selectivity of any applied
predicates.)

--Jeff

Nov 9 '06 #1
8 3512
Jeff,
I'd like to see the SQL to know what the join is between B and the
others, also what indexes there are. I assume you've done runstats etc
on all involved tables & indexes.
How many rows are in memory at point 18?
How many rows are in the result of the join at point 17?

--Greg

jefftyzzer wrote:
Friends:

Consider the plan fragment below:

NLJOIN
( 17)
134251
79996.9
/---------+--------\
4804 0.770869
IXSCAN IXSCAN
( 18) ( 23)
520.887 37.8431
108 3
/---+---\ |
0 12170 9.30829e+06
NLJOIN INDEX: XXX INDEX: XXX
( 19) AAAAAAAAA BBBBBBBBB
0.018982
0
/---------+---------\
0 1
FETCH IXSCAN
( 20) ( 22)
0.018982 12.6422
0 1
/----+----\ |
0 0 36323
IXSCAN TABLE: XXX INDEX: XXX
( 21) CCCCCCCCCC DDDDDDDDD
0.0185594
0
|
0
INDEX: XXX
EEEEEEEEE

Indexes:

AAAAAAAAA - 108 leaf nodes, table cardinality is 12,170
BBBBBBBBB - 95,047 leaf nodes, table cardinality is 9,308,293
DDDDDDDDD - 182 leaf nodes, table cardinality is 36,323
EEEEEEEEE - No rows

I'm trying to reason out the cost of the NLJOIN above. Given what I
understand about such joins *generally*, there should be one (or more)
I/Os to the inner table for every I/O of the outer. In the above, there
are three I/Os for each of 108 outer I/Os (right?), so I'd expect the
I/O cost to be 324: 108*3. So, where do the ca. 80K I/Os come from? I'm
clearly missing something fundamental; please disabuse me of my
misunderstanding. (Please also let me know if you need more information
about the underlying indexes or the selectivity of any applied
predicates.)

--Jeff
Nov 9 '06 #2
Greg:

Thanks for your reply. In the interest of brevity, would you be
amenable to my sending you the output from db2exfmt via e-mail?
Assuming we arrive at an answer, I will then recap that here for other
forum participants.

Regards,

--Jeff

Greg Nash wrote:
Jeff,
I'd like to see the SQL to know what the join is between B and the
others, also what indexes there are. I assume you've done runstats etc
on all involved tables & indexes.
How many rows are in memory at point 18?
How many rows are in the result of the join at point 17?

--Greg

jefftyzzer wrote:
Friends:

Consider the plan fragment below:

NLJOIN
( 17)
134251
79996.9
/---------+--------\
4804 0.770869
IXSCAN IXSCAN
( 18) ( 23)
520.887 37.8431
108 3
/---+---\ |
0 12170 9.30829e+06
NLJOIN INDEX: XXX INDEX: XXX
( 19) AAAAAAAAA BBBBBBBBB
0.018982
0
/---------+---------\
0 1
FETCH IXSCAN
( 20) ( 22)
0.018982 12.6422
0 1
/----+----\ |
0 0 36323
IXSCAN TABLE: XXX INDEX: XXX
( 21) CCCCCCCCCC DDDDDDDDD
0.0185594
0
|
0
INDEX: XXX
EEEEEEEEE

Indexes:

AAAAAAAAA - 108 leaf nodes, table cardinality is 12,170
BBBBBBBBB - 95,047 leaf nodes, table cardinality is 9,308,293
DDDDDDDDD - 182 leaf nodes, table cardinality is 36,323
EEEEEEEEE - No rows

I'm trying to reason out the cost of the NLJOIN above. Given what I
understand about such joins *generally*, there should be one (or more)
I/Os to the inner table for every I/O of the outer. In the above, there
are three I/Os for each of 108 outer I/Os (right?), so I'd expect the
I/O cost to be 324: 108*3. So, where do the ca. 80K I/Os come from? I'm
clearly missing something fundamental; please disabuse me of my
misunderstanding. (Please also let me know if you need more information
about the underlying indexes or the selectivity of any applied
predicates.)

--Jeff
Nov 10 '06 #3
Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.

Hope this helps,
Miro

Nov 10 '06 #4
mirof007 wrote:
Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.

Hope this helps,
Miro
Jeff,

The point is that the Nested Loop JOIN doesn't add - it multiplies
(approximately). Your particular query is scanning BBBBBBBB, and for
each record it is then evaluating the entire structure on the other side
of the join.

From your query (received by email), the NLJOIN comes from:

SELECT something
FROM B
WHERE {B-conditions}
AND NOT EXIST (
SELECT 1
FROM A,C,D,E
WHERE (joins between A,C,D,E)
AND A.x=B.x
AND otherconditions
)

About the one thing I remember being taught formally about SQL is
"correlated subqueries can be very slow". This seems to be the case
here. Depending on the proportions of A (and C,D,E) vs B, perhaps use:
SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT A.x FROM A,C,D,E
WHERE joins AND ACDE-conditions [not mentioning B]) )

This builds a table in memory of all the x's you want to exclude - which
could be a big task in itself, though it's only done once for the query.
Or, perhaps:

SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT B.x
FROM B,A,C,D,E
WHERE joins
AND B-conditions
AND ACDE-conditions) )

This narrows the search on A,C,D,E according to your original B-conditions.

Given your actual query is somewhat bigger and includes a few instances
of NOT EXIST (correlated subquery), start with a smaller part of the
query and see how fast you can get it, then try the same approach as you
add in the other requirements.
Even though you'd be adding B (a fair sized table) into several places,
the repetition of the B-conditions means it may just do that lookup once
and use it in several places.

Others here may be able to write more efficiently, I'm open to
correction/improvement :-)

--Greg
Nov 10 '06 #5
Thanks very much, Greg. Note that in addtion to switching my NOT
EXISTSs to NOT INs, I've also looked at switching them to LEFT JOINs
with a NULL test. So many ways to achieve the same result set.... :-)

--Jeff

Greg Nash wrote:
mirof007 wrote:
Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.

Hope this helps,
Miro
Jeff,

The point is that the Nested Loop JOIN doesn't add - it multiplies
(approximately). Your particular query is scanning BBBBBBBB, and for
each record it is then evaluating the entire structure on the other side
of the join.

From your query (received by email), the NLJOIN comes from:

SELECT something
FROM B
WHERE {B-conditions}
AND NOT EXIST (
SELECT 1
FROM A,C,D,E
WHERE (joins between A,C,D,E)
AND A.x=B.x
AND otherconditions
)

About the one thing I remember being taught formally about SQL is
"correlated subqueries can be very slow". This seems to be the case
here. Depending on the proportions of A (and C,D,E) vs B, perhaps use:
SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT A.x FROM A,C,D,E
WHERE joins AND ACDE-conditions [not mentioning B]) )

This builds a table in memory of all the x's you want to exclude - which
could be a big task in itself, though it's only done once for the query.
Or, perhaps:

SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT B.x
FROM B,A,C,D,E
WHERE joins
AND B-conditions
AND ACDE-conditions) )

This narrows the search on A,C,D,E according to your original B-conditions.

Given your actual query is somewhat bigger and includes a few instances
of NOT EXIST (correlated subquery), start with a smaller part of the
query and see how fast you can get it, then try the same approach as you
add in the other requirements.
Even though you'd be adding B (a fair sized table) into several places,
the repetition of the B-conditions means it may just do that lookup once
and use it in several places.

Others here may be able to write more efficiently, I'm open to
correction/improvement :-)

--Greg
Nov 10 '06 #6
That does help. Many thanks.

--Jeff

mirof007 wrote:
Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.

Hope this helps,
Miro
Nov 10 '06 #7
Miro,

That does help. Many thanks.

--Jeff

mirof007 wrote:
Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.

Hope this helps,
Miro
Nov 10 '06 #8
It'd be interesting to see which turned out better, or whether the
optimiser made them the same in the end. There's a trick with LEFT
JOIN and NULL that you'd have to put the C,D,E conditions into the ON
clauses, not the WHERE. I guess you've figured that out.

e.g.
FROM B
LEFT JOIN A on A.x=B.x
LEFT JOIN C on C.y=A.y and C.status in ('OPEN',...)
WHERE B-conditions
AND A.x IS NULL
jefftyzzer wrote:
Thanks very much, Greg. Note that in addtion to switching my NOT
EXISTSs to NOT INs, I've also looked at switching them to LEFT JOINs
with a NULL test. So many ways to achieve the same result set.... :-)

--Jeff

Greg Nash wrote:
>>mirof007 wrote:

>>>Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.

Hope this helps,
Miro

Jeff,

The point is that the Nested Loop JOIN doesn't add - it multiplies
(approximately). Your particular query is scanning BBBBBBBB, and for
each record it is then evaluating the entire structure on the other side
of the join.

From your query (received by email), the NLJOIN comes from:

SELECT something
FROM B
WHERE {B-conditions}
AND NOT EXIST (
SELECT 1
FROM A,C,D,E
WHERE (joins between A,C,D,E)
AND A.x=B.x
AND otherconditions
)

About the one thing I remember being taught formally about SQL is
"correlated subqueries can be very slow". This seems to be the case
here. Depending on the proportions of A (and C,D,E) vs B, perhaps use:
SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT A.x FROM A,C,D,E
WHERE joins AND ACDE-conditions [not mentioning B]) )

This builds a table in memory of all the x's you want to exclude - which
could be a big task in itself, though it's only done once for the query.
Or, perhaps:

SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT B.x
FROM B,A,C,D,E
WHERE joins
AND B-conditions
AND ACDE-conditions) )

This narrows the search on A,C,D,E according to your original B-conditions.

Given your actual query is somewhat bigger and includes a few instances
of NOT EXIST (correlated subquery), start with a smaller part of the
query and see how fast you can get it, then try the same approach as you
add in the other requirements.
Even though you'd be adding B (a fair sized table) into several places,
the repetition of the B-conditions means it may just do that lookup once
and use it in several places.

Others here may be able to write more efficiently, I'm open to
correction/improvement :-)

--Greg

Nov 11 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Dan | last post: by
2 posts views Thread by Dan | last post: by
1 post views Thread by cesar.zapata | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.