468,103 Members | 1,249 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,103 developers. It's quick & easy.

DPF Question

A join query is running against my db which is logically partitioned.
The join involves a partitioned table (has around 300 Million rows)
which is spread across 7 partitions and a small non-partitioned table
(has around 2000 rows). Joining the table based on the partitioning key

The query has been running for the past 30 mins without fetching any
result set and usually I understand that it would run only for less
than 5 mins.

Statistics are upto date on both tables
Query:

SELECT distinct ltrim(rtrim(A.VEH_IDENT_NBR)),
ltrim(rtrim(A.VEH_IDENT_NBR)) AS VEH_IDENT_NBR_LKP
FROM CARD.GLOBAL_VEH A
join
ETL.STG_NA_GIF_VEH_DLVRY_CAN_EVNT_Vin B
on
A.VEH_IDENT_NBR = B.VEH_IDENT_NBR for read only

When I checked the appln.snapshot the query is in executing state on
co-ordinator partiton while on the rest of partition it is on "Waiting
to send on tablequeue" status

====
Part of Application Snapshot for co-ordinator partition
====
Total sort time = 0
Sort overflows = 0
Rows read = 1
Rows written = 53183604
Rows deleted = 0
Rows updated = 0
Rows inserted = 0
Rows fetched = 0
Buffer pool data logical reads = Not Collected
Buffer pool data physical reads = Not Collected
Buffer pool temporary data logical reads = Not Collected
Buffer pool temporary data physical reads = Not Collected
Buffer pool index logical reads = Not Collected
Buffer pool index logical reads = Not Collected
Buffer pool temporary index logical reads = Not Collected
Buffer pool temporary index physical reads = Not Collected
Blocking cursor = YES
Dynamic SQL statement text:
SELECT distinct ltrim(rtrim(A.VEH_IDENT_NBR)),
ltrim(rtrim(A.VEH_IDENT_NBR)) AS VEH_IDENT_NBR_LKP FROM CARD.GLOBAL_VEH
A joi
n ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin B on A.VEH_IDENT_NBR =
B.VEH_IDENT_NBR for read only

Subsection number = 0
Subsection database partition number = 0
Subsection status = Executing
Execution elapsed time (seconds) = 1825
Total user CPU time (sec.ms) = 0.000000
Total system CPU time (sec.ms) = 0.000000
Current number of tablequeue buffers overflowed = 0
Total number of tablequeue buffers overflowed = 0
Maximum number of tablequeue buffers overflowed = 0
Rows received on tablequeues = 0
Rows sent on tablequeues = 0
Rows read = 0
Rows written = 0
Number of agents working on subsection = 0
========
Part of Application Snapshot from one of the other partitions
========
Total sort time = 0
Sort overflows = 0
Rows read = 0
Rows written = 0
Rows deleted = 0
Rows updated = 0
Rows inserted = 0
Rows fetched = 0
Buffer pool data logical reads = Not Collected
Buffer pool data physical reads = Not Collected
Buffer pool temporary data logical reads = Not Collected
Buffer pool temporary data physical reads = Not Collected
Buffer pool index logical reads = Not Collected
Buffer pool index logical reads = Not Collected
Buffer pool temporary index logical reads = Not Collected
Buffer pool temporary index physical reads = Not Collected
Blocking cursor = NO
Dynamic SQL statement text:
SELECT distinct ltrim(rtrim(A.VEH_IDENT_NBR)),
ltrim(rtrim(A.VEH_IDENT_NBR)) AS VEH_IDENT_NBR_LKP FROM CARD.GLOBAL_VEH
A joi
n ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin B on A.VEH_IDENT_NBR =
B.VEH_IDENT_NBR for read only

Subsection number = 2
Subsection database partition number = 1
Subsection status = Waiting to send
on tablequeue
Node for which waiting on tablequeue = 0
Tablequeue ID on which agent is waiting = 2
Execution elapsed time (seconds) = 1826
Total user CPU time (sec.ms) = 11.971956
Total system CPU time (sec.ms) = 0.087213
Current number of tablequeue buffers overflowed = 0
Total number of tablequeue buffers overflowed = 0
Maximum number of tablequeue buffers overflowed = 0
Rows received on tablequeues = 0
Rows sent on tablequeues = 7598014
Rows read = 0
Rows written = 0
Number of agents working on subsection = 1

Agent process/thread ID = 2244632
Explain Plan is as below:

Rows
RETURN
( 1)
Cost
I/O
|
0
DTQ
( 2)
113.109
64
|
0
TBSCAN
( 3)
113.069
64
|
0
SORT
( 4)
113.068
64
|
0
NLJOIN
( 5)
113.067
64
/----+----\
0 1
TBSCAN TBSCAN
( 6) ( 7)
113.067 7.73186e+06
64 1.01032e+06
| |
0 5.88295e+07
TABLE: ETL TEMP
STG_NA_GIF_VEH_D ( 8)
6.13434e+06
I dont this this is waiting for any locks because
[1] IS lock on both tables
[2] Recursive app.snap shows incremental values for "Rows sent on
tablequeues" section.

Questions:
[1] How to troubleshoot further to identify the problem?
[2] Why does the optimizer broadcasts the data from big table instead
of broadcasting small table?

Any help is greatly appreciated.

Thanks,
Sam.

Jul 26 '06 #1
6 3901

Sam Durai wrote:
Questions:
[1] How to troubleshoot further to identify the problem?
[2] Why does the optimizer broadcasts the data from big table instead
of broadcasting small table?

Any help is greatly appreciated.

Thanks,
Sam.
What makes you think the large table is being broadcast? The explain
contains only one table queue for returning data to the coordinator
(expected/required in DPF). Also, you seem to have a sort in your query
plan, are you sure you're not specifying an ORDER BY?

Regards,
Miro

Jul 26 '06 #2
Ian
Sam Durai wrote:
>
I dont this this is waiting for any locks because
[1] IS lock on both tables
[2] Recursive app.snap shows incremental values for "Rows sent on
tablequeues" section.

Questions:
[1] How to troubleshoot further to identify the problem?
[2] Why does the optimizer broadcasts the data from big table instead
of broadcasting small table?
Are you sure your stats are up to date / correct? You plan shows that
ETL.STG_NA_GIF_VEH_D has 0 rows.

Whenever a table has 0 rows, the optimizer will choose it as the outer
table in the join.

If the table doesn't have 0 rows, this this is often a *really* bad
choice.


Jul 26 '06 #3
mirof007 wrote:
Also, you seem to have a sort in your query
plan, are you sure you're not specifying an ORDER BY?
The SORT is probably for the DISTINCT.

Jul 26 '06 #4
Sam Durai wrote:
====
Part of Application Snapshot for co-ordinator partition
====
Subsection number = 0
Subsection database partition number = 0
Rows received on tablequeues = 0
Rows sent on tablequeues = 0
========
Part of Application Snapshot from one of the other partitions
========
Subsection number = 2
Subsection database partition number = 1
Rows received on tablequeues = 0
Rows sent on tablequeues = 7598014
It looks like subsection 2 is sending rows, but subsection 0 has not
received any. Is there a subsection 1? Maybe there is some flavor of TQ
below one of the table scans is missing from the plan as posted?
Intuitively, I'd expect a TQ there somewhere because the two tables are
partitioned differently. You'd need to repartition at least one of them
in order to be able to join them.

My guess: subsection 2 is sending those 7598014 rows to subsection 1
for the join, where the results are being TEMPed to disk for the SORT.
At the point in time where you gathered this status, the SORT had not
completed, so no rows had been returned to subsection 0 (and thus from
your query).

Maybe the big table is being repartitioned to the partition with the
small table? The number of rows sent from subsection 2 seems to support
that - you said the small table does not have that many rows. More
evidence: the big TEMP (number 8, in the lower right of the plan)
should have operators below it showing where the rows in the TEMP came
from.

It looks like ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin is the small table
then, and as someone else pointed out, the cardinality looks fishy. If
DB2 expects 0 rows in the ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin table
for the Nested Loop Join, then it expects to not need to look at the
big table much at all.

What does "select card from sysstat.tables where tabschema = 'ETL' and
tabname = 'ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin'" give you? 2000? 0?

Jul 26 '06 #5

Harold wrote:
mirof007 wrote:
Also, you seem to have a sort in your query
plan, are you sure you're not specifying an ORDER BY?

The SORT is probably for the DISTINCT.
Thanks, you're right, I missed that. Also, the explain output is
incomplete, looks like it was cut off at the TEMP(8). My suggestion
would be to run runstats with distribution on both tables and
re-explain.

Regards,
Miro

Jul 26 '06 #6
Thanks for the valuable information. As per the suggestion I reran the
runstats on the the small table then the query ran quickly as expected
perhaps the problem was that runstats failed with an obvious error
message when it was called earlier and it went unnoticed. Apologies as
I didnt notice this properly and posted a message.

Also both the tables which I referred earlier was partitioned using the
same partitioning key.

Answers for some of the questions posted
1. What makes you think the large table is being broadcast?
As I see Subsection status as "Waiting to send on tablequeue "
and Rows sent on tablequeues = 7598014. Since this much volume of data
is available only on the big table and the expected resultset is also
around 2000 I felt so. Please correct me if my understanding is wrong.

2. It looks like subsection 2 is sending rows, but subsection 0 has not
received any. Is there a subsection 1? Maybe there is some flavor of TQ
below one of the table scans is missing from the plan as posted?
Yes app.snapshot shows me subsections 0 - 7 but I pasted only section
0 and Sec 2 (as section 1 - sec 7 has relatively the same information).
I have pasted the entire access plan.

Access plan for the same query after updating statistics
Rows
RETURN
( 1)
Cost
I/O
|
29680
DTQ
( 2)
164487
13241
|
4240
I/O
|
29680
DTQ
( 2)
164487
13241
|
4240
MDTQ
( 3)
164482
13241
|
4240
TBSCAN
( 4)
164479
13241
|
4240
SORT
( 5)
164479
13241
|
4240
NLJOIN
( 6)
164473
13241
/----+---\
4240 1
DTQ IXSCAN
( 7) ( 9)
958.519 38.5879
521 3
| |
2680 8.40422e+06
TBSCAN INDEX: SYSIBM
( 8) SQL0602021344512
953.116
521
|
2680
TABLE: ETL
STG_NA_GIF_VEH_D
My question;
1) In the above access plan since DTQ (7) is present on the small table
shall I assume that small table is broadcasted to the other 7
partitions where the big table is located ?
2) Is there a way to intelligently force DB2 optimizer to broadcast the
small table whenever a join happens between a big partitioned table and
a small non-partitioned table ?

Thanks
Sam.
mirof007 wrote:
Harold wrote:
mirof007 wrote:
Also, you seem to have a sort in your query
plan, are you sure you're not specifying an ORDER BY?
The SORT is probably for the DISTINCT.

Thanks, you're right, I missed that. Also, the explain output is
incomplete, looks like it was cut off at the TEMP(8). My suggestion
would be to run runstats with distribution on both tables and
re-explain.

Regards,
Miro
Jul 26 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Mohammed Mazid | last post: by
3 posts views Thread by Stevey | last post: by
10 posts views Thread by glenn | last post: by
53 posts views Thread by Jeff | last post: by
56 posts views Thread by spibou | last post: by
2 posts views Thread by Allan Ebdrup | last post: by
3 posts views Thread by Zhang Weiwu | last post: by
1 post views Thread by Solo | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.