473,396 Members | 2,098 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

DPF Question

A join query is running against my db which is logically partitioned.
The join involves a partitioned table (has around 300 Million rows)
which is spread across 7 partitions and a small non-partitioned table
(has around 2000 rows). Joining the table based on the partitioning key

The query has been running for the past 30 mins without fetching any
result set and usually I understand that it would run only for less
than 5 mins.

Statistics are upto date on both tables
Query:

SELECT distinct ltrim(rtrim(A.VEH_IDENT_NBR)),
ltrim(rtrim(A.VEH_IDENT_NBR)) AS VEH_IDENT_NBR_LKP
FROM CARD.GLOBAL_VEH A
join
ETL.STG_NA_GIF_VEH_DLVRY_CAN_EVNT_Vin B
on
A.VEH_IDENT_NBR = B.VEH_IDENT_NBR for read only

When I checked the appln.snapshot the query is in executing state on
co-ordinator partiton while on the rest of partition it is on "Waiting
to send on tablequeue" status

====
Part of Application Snapshot for co-ordinator partition
====
Total sort time = 0
Sort overflows = 0
Rows read = 1
Rows written = 53183604
Rows deleted = 0
Rows updated = 0
Rows inserted = 0
Rows fetched = 0
Buffer pool data logical reads = Not Collected
Buffer pool data physical reads = Not Collected
Buffer pool temporary data logical reads = Not Collected
Buffer pool temporary data physical reads = Not Collected
Buffer pool index logical reads = Not Collected
Buffer pool index logical reads = Not Collected
Buffer pool temporary index logical reads = Not Collected
Buffer pool temporary index physical reads = Not Collected
Blocking cursor = YES
Dynamic SQL statement text:
SELECT distinct ltrim(rtrim(A.VEH_IDENT_NBR)),
ltrim(rtrim(A.VEH_IDENT_NBR)) AS VEH_IDENT_NBR_LKP FROM CARD.GLOBAL_VEH
A joi
n ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin B on A.VEH_IDENT_NBR =
B.VEH_IDENT_NBR for read only

Subsection number = 0
Subsection database partition number = 0
Subsection status = Executing
Execution elapsed time (seconds) = 1825
Total user CPU time (sec.ms) = 0.000000
Total system CPU time (sec.ms) = 0.000000
Current number of tablequeue buffers overflowed = 0
Total number of tablequeue buffers overflowed = 0
Maximum number of tablequeue buffers overflowed = 0
Rows received on tablequeues = 0
Rows sent on tablequeues = 0
Rows read = 0
Rows written = 0
Number of agents working on subsection = 0
========
Part of Application Snapshot from one of the other partitions
========
Total sort time = 0
Sort overflows = 0
Rows read = 0
Rows written = 0
Rows deleted = 0
Rows updated = 0
Rows inserted = 0
Rows fetched = 0
Buffer pool data logical reads = Not Collected
Buffer pool data physical reads = Not Collected
Buffer pool temporary data logical reads = Not Collected
Buffer pool temporary data physical reads = Not Collected
Buffer pool index logical reads = Not Collected
Buffer pool index logical reads = Not Collected
Buffer pool temporary index logical reads = Not Collected
Buffer pool temporary index physical reads = Not Collected
Blocking cursor = NO
Dynamic SQL statement text:
SELECT distinct ltrim(rtrim(A.VEH_IDENT_NBR)),
ltrim(rtrim(A.VEH_IDENT_NBR)) AS VEH_IDENT_NBR_LKP FROM CARD.GLOBAL_VEH
A joi
n ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin B on A.VEH_IDENT_NBR =
B.VEH_IDENT_NBR for read only

Subsection number = 2
Subsection database partition number = 1
Subsection status = Waiting to send
on tablequeue
Node for which waiting on tablequeue = 0
Tablequeue ID on which agent is waiting = 2
Execution elapsed time (seconds) = 1826
Total user CPU time (sec.ms) = 11.971956
Total system CPU time (sec.ms) = 0.087213
Current number of tablequeue buffers overflowed = 0
Total number of tablequeue buffers overflowed = 0
Maximum number of tablequeue buffers overflowed = 0
Rows received on tablequeues = 0
Rows sent on tablequeues = 7598014
Rows read = 0
Rows written = 0
Number of agents working on subsection = 1

Agent process/thread ID = 2244632
Explain Plan is as below:

Rows
RETURN
( 1)
Cost
I/O
|
0
DTQ
( 2)
113.109
64
|
0
TBSCAN
( 3)
113.069
64
|
0
SORT
( 4)
113.068
64
|
0
NLJOIN
( 5)
113.067
64
/----+----\
0 1
TBSCAN TBSCAN
( 6) ( 7)
113.067 7.73186e+06
64 1.01032e+06
| |
0 5.88295e+07
TABLE: ETL TEMP
STG_NA_GIF_VEH_D ( 8)
6.13434e+06
I dont this this is waiting for any locks because
[1] IS lock on both tables
[2] Recursive app.snap shows incremental values for "Rows sent on
tablequeues" section.

Questions:
[1] How to troubleshoot further to identify the problem?
[2] Why does the optimizer broadcasts the data from big table instead
of broadcasting small table?

Any help is greatly appreciated.

Thanks,
Sam.

Jul 26 '06 #1
6 4072

Sam Durai wrote:
Questions:
[1] How to troubleshoot further to identify the problem?
[2] Why does the optimizer broadcasts the data from big table instead
of broadcasting small table?

Any help is greatly appreciated.

Thanks,
Sam.
What makes you think the large table is being broadcast? The explain
contains only one table queue for returning data to the coordinator
(expected/required in DPF). Also, you seem to have a sort in your query
plan, are you sure you're not specifying an ORDER BY?

Regards,
Miro

Jul 26 '06 #2
Ian
Sam Durai wrote:
>
I dont this this is waiting for any locks because
[1] IS lock on both tables
[2] Recursive app.snap shows incremental values for "Rows sent on
tablequeues" section.

Questions:
[1] How to troubleshoot further to identify the problem?
[2] Why does the optimizer broadcasts the data from big table instead
of broadcasting small table?
Are you sure your stats are up to date / correct? You plan shows that
ETL.STG_NA_GIF_VEH_D has 0 rows.

Whenever a table has 0 rows, the optimizer will choose it as the outer
table in the join.

If the table doesn't have 0 rows, this this is often a *really* bad
choice.


Jul 26 '06 #3
mirof007 wrote:
Also, you seem to have a sort in your query
plan, are you sure you're not specifying an ORDER BY?
The SORT is probably for the DISTINCT.

Jul 26 '06 #4
Sam Durai wrote:
====
Part of Application Snapshot for co-ordinator partition
====
Subsection number = 0
Subsection database partition number = 0
Rows received on tablequeues = 0
Rows sent on tablequeues = 0
========
Part of Application Snapshot from one of the other partitions
========
Subsection number = 2
Subsection database partition number = 1
Rows received on tablequeues = 0
Rows sent on tablequeues = 7598014
It looks like subsection 2 is sending rows, but subsection 0 has not
received any. Is there a subsection 1? Maybe there is some flavor of TQ
below one of the table scans is missing from the plan as posted?
Intuitively, I'd expect a TQ there somewhere because the two tables are
partitioned differently. You'd need to repartition at least one of them
in order to be able to join them.

My guess: subsection 2 is sending those 7598014 rows to subsection 1
for the join, where the results are being TEMPed to disk for the SORT.
At the point in time where you gathered this status, the SORT had not
completed, so no rows had been returned to subsection 0 (and thus from
your query).

Maybe the big table is being repartitioned to the partition with the
small table? The number of rows sent from subsection 2 seems to support
that - you said the small table does not have that many rows. More
evidence: the big TEMP (number 8, in the lower right of the plan)
should have operators below it showing where the rows in the TEMP came
from.

It looks like ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin is the small table
then, and as someone else pointed out, the cardinality looks fishy. If
DB2 expects 0 rows in the ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin table
for the Nested Loop Join, then it expects to not need to look at the
big table much at all.

What does "select card from sysstat.tables where tabschema = 'ETL' and
tabname = 'ETL.STG_NA_GIF_VEH_DELIVERY_EVENT_Vin'" give you? 2000? 0?

Jul 26 '06 #5

Harold wrote:
mirof007 wrote:
Also, you seem to have a sort in your query
plan, are you sure you're not specifying an ORDER BY?

The SORT is probably for the DISTINCT.
Thanks, you're right, I missed that. Also, the explain output is
incomplete, looks like it was cut off at the TEMP(8). My suggestion
would be to run runstats with distribution on both tables and
re-explain.

Regards,
Miro

Jul 26 '06 #6
Thanks for the valuable information. As per the suggestion I reran the
runstats on the the small table then the query ran quickly as expected
perhaps the problem was that runstats failed with an obvious error
message when it was called earlier and it went unnoticed. Apologies as
I didnt notice this properly and posted a message.

Also both the tables which I referred earlier was partitioned using the
same partitioning key.

Answers for some of the questions posted
1. What makes you think the large table is being broadcast?
As I see Subsection status as "Waiting to send on tablequeue "
and Rows sent on tablequeues = 7598014. Since this much volume of data
is available only on the big table and the expected resultset is also
around 2000 I felt so. Please correct me if my understanding is wrong.

2. It looks like subsection 2 is sending rows, but subsection 0 has not
received any. Is there a subsection 1? Maybe there is some flavor of TQ
below one of the table scans is missing from the plan as posted?
Yes app.snapshot shows me subsections 0 - 7 but I pasted only section
0 and Sec 2 (as section 1 - sec 7 has relatively the same information).
I have pasted the entire access plan.

Access plan for the same query after updating statistics
Rows
RETURN
( 1)
Cost
I/O
|
29680
DTQ
( 2)
164487
13241
|
4240
I/O
|
29680
DTQ
( 2)
164487
13241
|
4240
MDTQ
( 3)
164482
13241
|
4240
TBSCAN
( 4)
164479
13241
|
4240
SORT
( 5)
164479
13241
|
4240
NLJOIN
( 6)
164473
13241
/----+---\
4240 1
DTQ IXSCAN
( 7) ( 9)
958.519 38.5879
521 3
| |
2680 8.40422e+06
TBSCAN INDEX: SYSIBM
( 8) SQL0602021344512
953.116
521
|
2680
TABLE: ETL
STG_NA_GIF_VEH_D
My question;
1) In the above access plan since DTQ (7) is present on the small table
shall I assume that small table is broadcasted to the other 7
partitions where the big table is located ?
2) Is there a way to intelligently force DB2 optimizer to broadcast the
small table whenever a join happens between a big partitioned table and
a small non-partitioned table ?

Thanks
Sam.
mirof007 wrote:
Harold wrote:
mirof007 wrote:
Also, you seem to have a sort in your query
plan, are you sure you're not specifying an ORDER BY?
The SORT is probably for the DISTINCT.

Thanks, you're right, I missed that. Also, the explain output is
incomplete, looks like it was cut off at the TEMP(8). My suggestion
would be to run runstats with distribution on both tables and
re-explain.

Regards,
Miro
Jul 26 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Mohammed Mazid | last post by:
Can anyone please help me on how to move to the next and previous question? Here is a snippet of my code: Private Sub cmdNext_Click() End Sub Private Sub cmdPrevious_Click() showrecord
3
by: Stevey | last post by:
I have the following XML file... <?xml version="1.0"?> <animals> <animal> <name>Tiger</name> <questions> <question index="0">true</question> <question index="1">true</question> </questions>
7
by: nospam | last post by:
Ok, 3rd or is it the 4th time I have asked this question on Partial Types, so, since it seems to me that Partial Types is still in the design or development stages at Microsoft, I am going to ask...
3
by: Ekqvist Marko | last post by:
Hi, I have one Access database table including questions and answers. Now I need to give answer id automatically to questionID column. But I don't know how it is best (fastest) to do? table...
10
by: glenn | last post by:
I am use to programming in php and the way session and post vars are past from fields on one page through to the post page automatically where I can get to their values easily to write to a...
10
by: Rider | last post by:
Hi, simple(?) question about asp.net configuration.. I've installed ASP.NET 2.0 QuickStart Sample successfully. But, When I'm first start application the follow message shown. ========= Server...
53
by: Jeff | last post by:
In the function below, can size ever be 0 (zero)? char *clc_strdup(const char * CLC_RESTRICT s) { size_t size; char *p; clc_assert_not_null(clc_strdup, s); size = strlen(s) + 1;
56
by: spibou | last post by:
In the statement "a *= expression" is expression assumed to be parenthesized ? For example if I write "a *= b+c" is this the same as "a = a * (b+c)" or "a = a * b+c" ?
2
by: Allan Ebdrup | last post by:
Hi, I'm trying to render a Matrix question in my ASP.Net 2.0 page, A matrix question is a question where you have several options that can all be rated according to several possible ratings (from...
3
by: Zhang Weiwu | last post by:
Hello! I wrote this: ..required-question p:after { content: "*"; } Corresponding HTML: <div class="required-question"><p>Question Text</p><input /></div> <div...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.