Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old November 9th, 2006, 07:35 PM
jefftyzzer
Guest
 
Posts: n/a
Default I/O cost of an NLJOIN

Friends:

Consider the plan fragment below:

NLJOIN
( 17)
134251
79996.9
/---------+--------\
4804 0.770869
IXSCAN IXSCAN
( 18) ( 23)
520.887 37.8431
108 3
/---+---\ |
0 12170 9.30829e+06
NLJOIN INDEX: XXX INDEX: XXX
( 19) AAAAAAAAA BBBBBBBBB
0.018982
0
/---------+---------\
0 1
FETCH IXSCAN
( 20) ( 22)
0.018982 12.6422
0 1
/----+----\ |
0 0 36323
IXSCAN TABLE: XXX INDEX: XXX
( 21) CCCCCCCCCC DDDDDDDDD
0.0185594
0
|
0
INDEX: XXX
EEEEEEEEE

Indexes:

AAAAAAAAA - 108 leaf nodes, table cardinality is 12,170
BBBBBBBBB - 95,047 leaf nodes, table cardinality is 9,308,293
DDDDDDDDD - 182 leaf nodes, table cardinality is 36,323
EEEEEEEEE - No rows

I'm trying to reason out the cost of the NLJOIN above. Given what I
understand about such joins *generally*, there should be one (or more)
I/Os to the inner table for every I/O of the outer. In the above, there
are three I/Os for each of 108 outer I/Os (right?), so I'd expect the
I/O cost to be 324: 108*3. So, where do the ca. 80K I/Os come from? I'm
clearly missing something fundamental; please disabuse me of my
misunderstanding. (Please also let me know if you need more information
about the underlying indexes or the selectivity of any applied
predicates.)

--Jeff

  #2  
Old November 9th, 2006, 11:45 PM
Greg Nash
Guest
 
Posts: n/a
Default Re: I/O cost of an NLJOIN

Jeff,
I'd like to see the SQL to know what the join is between B and the
others, also what indexes there are. I assume you've done runstats etc
on all involved tables & indexes.
How many rows are in memory at point 18?
How many rows are in the result of the join at point 17?

--Greg

jefftyzzer wrote:
Quote:
Friends:
>
Consider the plan fragment below:
>
NLJOIN
( 17)
134251
79996.9
/---------+--------\
4804 0.770869
IXSCAN IXSCAN
( 18) ( 23)
520.887 37.8431
108 3
/---+---\ |
0 12170 9.30829e+06
NLJOIN INDEX: XXX INDEX: XXX
( 19) AAAAAAAAA BBBBBBBBB
0.018982
0
/---------+---------\
0 1
FETCH IXSCAN
( 20) ( 22)
0.018982 12.6422
0 1
/----+----\ |
0 0 36323
IXSCAN TABLE: XXX INDEX: XXX
( 21) CCCCCCCCCC DDDDDDDDD
0.0185594
0
|
0
INDEX: XXX
EEEEEEEEE
>
Indexes:
>
AAAAAAAAA - 108 leaf nodes, table cardinality is 12,170
BBBBBBBBB - 95,047 leaf nodes, table cardinality is 9,308,293
DDDDDDDDD - 182 leaf nodes, table cardinality is 36,323
EEEEEEEEE - No rows
>
I'm trying to reason out the cost of the NLJOIN above. Given what I
understand about such joins *generally*, there should be one (or more)
I/Os to the inner table for every I/O of the outer. In the above, there
are three I/Os for each of 108 outer I/Os (right?), so I'd expect the
I/O cost to be 324: 108*3. So, where do the ca. 80K I/Os come from? I'm
clearly missing something fundamental; please disabuse me of my
misunderstanding. (Please also let me know if you need more information
about the underlying indexes or the selectivity of any applied
predicates.)
>
--Jeff
>
  #3  
Old November 10th, 2006, 12:05 AM
jefftyzzer
Guest
 
Posts: n/a
Default Re: I/O cost of an NLJOIN

Greg:

Thanks for your reply. In the interest of brevity, would you be
amenable to my sending you the output from db2exfmt via e-mail?
Assuming we arrive at an answer, I will then recap that here for other
forum participants.

Regards,

--Jeff

Greg Nash wrote:
Quote:
Jeff,
I'd like to see the SQL to know what the join is between B and the
others, also what indexes there are. I assume you've done runstats etc
on all involved tables & indexes.
How many rows are in memory at point 18?
How many rows are in the result of the join at point 17?
>
--Greg
>
jefftyzzer wrote:
Quote:
Friends:

Consider the plan fragment below:

NLJOIN
( 17)
134251
79996.9
/---------+--------\
4804 0.770869
IXSCAN IXSCAN
( 18) ( 23)
520.887 37.8431
108 3
/---+---\ |
0 12170 9.30829e+06
NLJOIN INDEX: XXX INDEX: XXX
( 19) AAAAAAAAA BBBBBBBBB
0.018982
0
/---------+---------\
0 1
FETCH IXSCAN
( 20) ( 22)
0.018982 12.6422
0 1
/----+----\ |
0 0 36323
IXSCAN TABLE: XXX INDEX: XXX
( 21) CCCCCCCCCC DDDDDDDDD
0.0185594
0
|
0
INDEX: XXX
EEEEEEEEE

Indexes:

AAAAAAAAA - 108 leaf nodes, table cardinality is 12,170
BBBBBBBBB - 95,047 leaf nodes, table cardinality is 9,308,293
DDDDDDDDD - 182 leaf nodes, table cardinality is 36,323
EEEEEEEEE - No rows

I'm trying to reason out the cost of the NLJOIN above. Given what I
understand about such joins *generally*, there should be one (or more)
I/Os to the inner table for every I/O of the outer. In the above, there
are three I/Os for each of 108 outer I/Os (right?), so I'd expect the
I/O cost to be 324: 108*3. So, where do the ca. 80K I/Os come from? I'm
clearly missing something fundamental; please disabuse me of my
misunderstanding. (Please also let me know if you need more information
about the underlying indexes or the selectivity of any applied
predicates.)

--Jeff
  #4  
Old November 10th, 2006, 02:25 AM
mirof007
Guest
 
Posts: n/a
Default Re: I/O cost of an NLJOIN

Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.

Hope this helps,
Miro

  #5  
Old November 10th, 2006, 10:55 PM
Greg Nash
Guest
 
Posts: n/a
Default Re: I/O cost of an NLJOIN

mirof007 wrote:
Quote:
Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.
>
Hope this helps,
Miro
>
Jeff,

The point is that the Nested Loop JOIN doesn't add - it multiplies
(approximately). Your particular query is scanning BBBBBBBB, and for
each record it is then evaluating the entire structure on the other side
of the join.

From your query (received by email), the NLJOIN comes from:

SELECT something
FROM B
WHERE {B-conditions}
AND NOT EXIST (
SELECT 1
FROM A,C,D,E
WHERE (joins between A,C,D,E)
AND A.x=B.x
AND otherconditions
)

About the one thing I remember being taught formally about SQL is
"correlated subqueries can be very slow". This seems to be the case
here. Depending on the proportions of A (and C,D,E) vs B, perhaps use:


SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT A.x FROM A,C,D,E
WHERE joins AND ACDE-conditions [not mentioning B]) )

This builds a table in memory of all the x's you want to exclude - which
could be a big task in itself, though it's only done once for the query.
Or, perhaps:

SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT B.x
FROM B,A,C,D,E
WHERE joins
AND B-conditions
AND ACDE-conditions) )

This narrows the search on A,C,D,E according to your original B-conditions.

Given your actual query is somewhat bigger and includes a few instances
of NOT EXIST (correlated subquery), start with a smaller part of the
query and see how fast you can get it, then try the same approach as you
add in the other requirements.
Even though you'd be adding B (a fair sized table) into several places,
the repetition of the B-conditions means it may just do that lookup once
and use it in several places.

Others here may be able to write more efficiently, I'm open to
correction/improvement :-)

--Greg


  #6  
Old November 10th, 2006, 11:25 PM
jefftyzzer
Guest
 
Posts: n/a
Default Re: I/O cost of an NLJOIN

Thanks very much, Greg. Note that in addtion to switching my NOT
EXISTSs to NOT INs, I've also looked at switching them to LEFT JOINs
with a NULL test. So many ways to achieve the same result set.... :-)

--Jeff

Greg Nash wrote:
Quote:
mirof007 wrote:
>
Quote:
Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.

Hope this helps,
Miro
Jeff,
>
The point is that the Nested Loop JOIN doesn't add - it multiplies
(approximately). Your particular query is scanning BBBBBBBB, and for
each record it is then evaluating the entire structure on the other side
of the join.
>
From your query (received by email), the NLJOIN comes from:
>
SELECT something
FROM B
WHERE {B-conditions}
AND NOT EXIST (
SELECT 1
FROM A,C,D,E
WHERE (joins between A,C,D,E)
AND A.x=B.x
AND otherconditions
)
>
About the one thing I remember being taught formally about SQL is
"correlated subqueries can be very slow". This seems to be the case
here. Depending on the proportions of A (and C,D,E) vs B, perhaps use:
>
>
SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT A.x FROM A,C,D,E
WHERE joins AND ACDE-conditions [not mentioning B]) )
>
This builds a table in memory of all the x's you want to exclude - which
could be a big task in itself, though it's only done once for the query.
Or, perhaps:
>
SELECT something
FROM B
WHERE B-conditions
AND NOT (B.x IN
(SELECT B.x
FROM B,A,C,D,E
WHERE joins
AND B-conditions
AND ACDE-conditions) )
>
This narrows the search on A,C,D,E according to your original B-conditions.
>
Given your actual query is somewhat bigger and includes a few instances
of NOT EXIST (correlated subquery), start with a smaller part of the
query and see how fast you can get it, then try the same approach as you
add in the other requirements.
Even though you'd be adding B (a fair sized table) into several places,
the repetition of the B-conditions means it may just do that lookup once
and use it in several places.
>
Others here may be able to write more efficiently, I'm open to
correction/improvement :-)
>
--Greg
  #7  
Old November 10th, 2006, 11:35 PM
jefftyzzer
Guest
 
Posts: n/a
Default Re: I/O cost of an NLJOIN

That does help. Many thanks.

--Jeff

mirof007 wrote:
Quote:
Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.
>
Hope this helps,
Miro
  #8  
Old November 10th, 2006, 11:35 PM
jefftyzzer
Guest
 
Posts: n/a
Default Re: I/O cost of an NLJOIN

Miro,

That does help. Many thanks.

--Jeff

mirof007 wrote:
Quote:
Jeff, DB2 is estimating that there will be 3 rows returned from the
inner per each of the 4804 rows of the outer. That makes ca. 15K probes
against the index BBBBBBBBB. Given that each probe has to traverse the
index from the root of the tree to the leaf level, each probe could
result in some 3 to 4 I/Os against the index. I know, it doesn't add up
to exactly 80K, but it's close and hopefully gives you an idea how this
works.
>
Hope this helps,
Miro
  #9  
Old November 11th, 2006, 12:25 AM
Greg Nash
Guest
 
Posts: n/a
Default Re: I/O cost of an NLJOIN

It'd be interesting to see which turned out better, or whether the
optimiser made them the same in the end. There's a trick with LEFT
JOIN and NULL that you'd have to put the C,D,E conditions into the ON
clauses, not the WHERE. I guess you've figured that out.

e.g.
FROM B
LEFT JOIN A on A.x=B.x
LEFT JOIN C on C.y=A.y and C.status in ('OPEN',...)
WHERE B-conditions
AND A.x IS NULL


jefftyzzer wrote:
Quote:
Thanks very much, Greg. Note that in addtion to switching my NOT
EXISTSs to NOT INs, I've also looked at switching them to LEFT JOINs
with a NULL test. So many ways to achieve the same result set.... :-)
>
--Jeff
>
Greg Nash wrote:
>
Quote:
>>mirof007 wrote:
>>
>>
Quote:
>>>Jeff, DB2 is estimating that there will be 3 rows returned from the
>>>inner per each of the 4804 rows of the outer. That makes ca. 15K probes
>>>against the index BBBBBBBBB. Given that each probe has to traverse the
>>>index from the root of the tree to the leaf level, each probe could
>>>result in some 3 to 4 I/Os against the index. I know, it doesn't add up
>>>to exactly 80K, but it's close and hopefully gives you an idea how this
>>>works.
>>>
>>>Hope this helps,
>>>Miro
>>>
>>
>>Jeff,
>>
>>The point is that the Nested Loop JOIN doesn't add - it multiplies
>>(approximately). Your particular query is scanning BBBBBBBB, and for
>>each record it is then evaluating the entire structure on the other side
>>of the join.
>>
>From your query (received by email), the NLJOIN comes from:
>>
>>SELECT something
>>FROM B
>>WHERE {B-conditions}
> AND NOT EXIST (
>> SELECT 1
>> FROM A,C,D,E
>> WHERE (joins between A,C,D,E)
>> AND A.x=B.x
>> AND otherconditions
> )
>>
>>About the one thing I remember being taught formally about SQL is
>>"correlated subqueries can be very slow". This seems to be the case
>>here. Depending on the proportions of A (and C,D,E) vs B, perhaps use:
>>
>>
>>SELECT something
>>FROM B
>>WHERE B-conditions
> AND NOT (B.x IN
>> (SELECT A.x FROM A,C,D,E
>> WHERE joins AND ACDE-conditions [not mentioning B]) )
>>
>>This builds a table in memory of all the x's you want to exclude - which
>>could be a big task in itself, though it's only done once for the query.
>>Or, perhaps:
>>
>>SELECT something
>>FROM B
>>WHERE B-conditions
> AND NOT (B.x IN
>> (SELECT B.x
>> FROM B,A,C,D,E
>> WHERE joins
>> AND B-conditions
>> AND ACDE-conditions) )
>>
>>This narrows the search on A,C,D,E according to your original B-conditions.
>>
>>Given your actual query is somewhat bigger and includes a few instances
>>of NOT EXIST (correlated subquery), start with a smaller part of the
>>query and see how fast you can get it, then try the same approach as you
>>add in the other requirements.
>>Even though you'd be adding B (a fair sized table) into several places,
>>the repetition of the B-conditions means it may just do that lookup once
>>and use it in several places.
>>
>>Others here may be able to write more efficiently, I'm open to
>>correction/improvement :-)
>>
>>--Greg
>
>
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles