By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,474 Members | 1,001 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,474 IT Pros & Developers. It's quick & easy.

Query performance - Please help

P: n/a
I execute a query (against DB2 for iSeries), which, in its generic form
is as follows. This query runs just fine, executing in a couple of
seconds

SELECT V.FIELD01, V.FIELD02, V.FIELD03, V.FIELD04, V.FIELD05,
V.FIELD06, V.FIELD07, V.FIELD08, V.FIELD09, V.FIELD10,
V.FIELD11, V.FIELD12, V.FIELD13
FROM SCHEMA1.VIEW1 V
WHERE V.FIELD2 BETWEEN '03/10/2005' AND '03/10/2006'
AND (V.FIELD4 = '103' OR V.FIELD4 = '100' )
AND (V.FIELD8 = 120 OR V.FIELD15= 120)
AND V.FIELD16 <> '4'
ORDER BY V.FIELD05, V.FIELD12

Now, I modify the query, because some business reason dictates that I
do so. I perform a join of the original view with some table on a
particular (single) column and also add an OR clause in the WHERE
clause. The modifiied query looks as follows and (here is my problem!)
takes circa 40 seconds to run!

SELECT V.FIELD01, V.FIELD02, V.FIELD03, V.FIELD04, V.FIELD05,
V.FIELD06, V.FIELD07, V.FIELD08, V.FIELD09, V.FIELD10,
V.FIELD11, V.FIELD12, V.FIELD13
FROM SCHEMA1.VIEW1 V, SCHEMA2.TABLE1 T //<-- (1) Table added
WHERE T.FIELD1 = 1 and V.FIELD14 = T.FIELD2 //<-- (2) Join condition
AND V.FIELD2 BETWEEN '03/10/2005' AND '03/10/2006'
AND (V.FIELD4 = '103' OR V.FIELD4 = '100' )
AND (V.FIELD8 = 120 OR V.FIELD15= 120 OR T.FIELD3 = 120)
// (3) Added an OR statement here --|
AND V.FIELD16 <> '4'
ORDER BY V.FIELD05, V.FIELD12

I wonder why this, at first sight innocuous join, causes such a
deterioration in performance.
I've run both queries through the Visual Explain tool that comes with
the iSeries navigator. The bottleneck in the second case is the hash
join performed between the results of scanning the component tables of
view V and those of table T. The "Estimated Processing Time" for that
hash join is in the range of 40.

The graph in Visual Explain looks can be descibed as follows:
(1) "Index Scan - Key Positioning" is applied to two of the component
tables (say T1 and T2) of view V. A "Nested Loop Join" is applied to
the results
(2) "Table Scan" is applied to the remaining component table (say T3)
of V. A "Temporary Hash Table" is produced
[Note that this processing of V is the same as the one performed in the
original, simpler, query]
(3) "Table Scan, Parallel" is applied to table T, that has now come
into the picture. A "Temporary Hash Table" gets produced.

Then a grand hash join is produced from the results of (1), (2), (3)
that takes 41 seconds! How could I avoid this?

Many, many thanks in advance!!!

Panagiotis Varlagas
va******@yahoo.com

Mar 16 '06 #1
Share this Question
Share on Google+
10 Replies


P: n/a
I am by no means an expert. I will offer two comments, however.

1) (V.FIELD4 = '103' OR V.FIELD4 = '100' ) should probably be
V.FIELD4 IN ('103', '100' )

2) I did not see the second TABLE being output. Perhaps an EXISTS
clause would be more appropriate than a JOIN.

B.

Mar 16 '06 #2

P: n/a
Hi Brian!

Wrt comment 1), surely it is a correct one, however its contribution to
performance improvement of the particular query is minimal, if any.

I applied your suggestion in comment 2) too. It did not provide any
performance improvent (the query still takes 40'' to run), however
there are two good points about it:

(a) It is a more concise way of me expressing what I want to say. It is
much clearer to the person that reads the query and tries to understand
what it is supposed to do

(b) It _does_ change the path shown in Visual Explain. No hash joins
involved now... But this change is not reflected into improved
performance

Thanks for the help!
Panagiotis

Mar 17 '06 #3

P: n/a
>1), surely it is a correct one, however its contribution to performance improvement of the particular query is minimal, if any.

It's is just easier to read that way. When i first looked at

(V.FIELD4 = '103' OR V.FIELD4 = '100' )
AND (V.FIELD8 = 120 OR V.FIELD15= 120)

I figured the second clause was also FIELD8 twice. I only noticed it
was two different fields on second look. Had the FIELD4 been using an
IN, i probably would not have made that mistake. Nothing major here,
just the point that IN makes things more clear.
But this change is not reflected into improved performance


Hmm... perhaps try using IN instead? I would think EXISTS should always
be better, but in many cases IN() works much much better.

B.

Mar 17 '06 #4

P: n/a
Is T.FIELD2 indexed? If it isn't, then this could be the cause of the
slowdown, in which case adding an index should speed it up. Or, using
the IN clause instead of EXISTS should also speed it up.

Mar 17 '06 #5

P: n/a
I had already rewritten the query to use either EXISTS or IN in lieu of
the inner join, but it actually yielded worse, not better performance.
Thx for the suggestion, however.

Mar 21 '06 #6

P: n/a
Unfortunately, T.FIELD2 is already indexed...

Mar 21 '06 #7

P: n/a
va******@yahoo.com wrote:
Unfortunately, T.FIELD2 is already indexed...


Have you tried the Design Advisor by providing your query as workload? It
should give you some suggestions regarding additional indexes. Likewise, a
MQT might help.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Mar 21 '06 #8

P: n/a
How did you use EXISTS?

How about try following ways with appropriate index.
1)
SELECT V.FIELD01, V.FIELD02, V.FIELD03, V.FIELD04, V.FIELD05,
V.FIELD06, V.FIELD07, V.FIELD08, V.FIELD09, V.FIELD10,
V.FIELD11, V.FIELD12, V.FIELD13
FROM SCHEMA1.VIEW1 V
WHERE
V.FIELD2 BETWEEN '03/10/2005' AND '03/10/2006'
AND V.FIELD4 IN ('103', '100')
AND V.FIELD16 <> '4'
AND EXISTS
(SELECT *
FROM SCHEMA2.TABLE1 T
WHERE V.FIELD14 = T.FIELD2
AND T.FIELD1 = 1
AND (V.FIELD8 = 120 OR V.FIELD15= 120 OR T.FIELD3 = 120)
)
ORDER BY
V.FIELD05, V.FIELD12

2)
SELECT V.FIELD01, V.FIELD02, V.FIELD03, V.FIELD04, V.FIELD05,
V.FIELD06, V.FIELD07, V.FIELD08, V.FIELD09, V.FIELD10,
V.FIELD11, V.FIELD12, V.FIELD13
FROM SCHEMA1.VIEW1 V
WHERE
V.FIELD2 BETWEEN '03/10/2005' AND '03/10/2006'
AND (V.FIELD8 = 120 OR V.FIELD15= 120)
AND V.FIELD4 IN ('103', '100')
AND V.FIELD16 <> '4'
AND EXISTS
(SELECT *
FROM SCHEMA2.TABLE1 T
WHERE V.FIELD14 = T.FIELD2
AND T.FIELD1 = 1)
)
OR
V.FIELD2 BETWEEN '03/10/2005' AND '03/10/2006'
AND V.FIELD4 IN ('103', '100')
AND V.FIELD16 <> '4'
AND EXISTS
(SELECT *
FROM SCHEMA2.TABLE1 T
WHERE V.FIELD14 = T.FIELD2
AND T.FIELD1 = 1
AND T.FIELD3 = 120)
)
ORDER BY
V.FIELD05, V.FIELD12

Mar 21 '06 #9

P: n/a
No. Actually, I am not familiar with these tools...

Mar 23 '06 #10

P: n/a
I made some progress. Basically, the following indexes needed to be
create:

- On TABLE1.FIELD3
- On the attribute of TABLE03 (one of the consituent tables of VIEW1)
on which the join with TABLE02 is performed in order to create the
view. The lack of this index contributed to much of the hit to
performance.

(Thanks for your help in pointing to me the right direction!)

But now, there is one more thing. Instead of joining VIEW1 and TABLE1
on the actually attributes (of type CHARACTER), I need to apply the
DECIMAL(.) function to them. The reason for this is that if column on
the TABLE1 side contains the value (say) of '0000001234', the
corresponding column on the VIEW1 side of the join will contain the
value of '1234' (!). So, I need to join on DECIMAL(V.FIELD14) =
DECIMAL(T.FIELD2) rather than V.FIELD14 = T.FIELD2. Any ideas on this
one?

Mar 23 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.