By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,017 Members | 951 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,017 IT Pros & Developers. It's quick & easy.

Help performace question DB2 UDB v 8.1.9 Linux

P: n/a
The following query runs very slowly compared to other queries on my
system. If I read the explain plan and its data correctly, it is due to
a table scan on animals caused by the join separated from the others by
space. There is an index on both dam_bhid and recip_bhid. I have tried
converting the on condition to a case statement to no good effect.
Likewise reordering the conditions. Do you have any suggestions.

The semantics of the join is that an1 has a calf on the ground if there
is a calf who she is the dam of when there is no recipient for that calf
or there is a calf of which she was the recipient. A calf is less than
283 days old and unweaned.

select
an1.bhid,
an1.breed_1,
an1.pct_1,
ap_donor.herd_id as donor_id,
ap_sire.herd_id as sire_id,
ap_dam.herd_id as dam_id,
loc.namex as loc_name,
lh.start_date as as_of_date,
case
when wn.WEIGH_DATE is null
then ap_calf.herd_id
else ''
end AS calf_id,
case
when wn.WEIGH_DATE is null
then char(an_calf.BIRTH_DATE, usa)
else ''
end AS calf_birth_date,
case
when wn.weigh_date is null
then an_calf.sex
else ''
end AS calf_sex,
ap_calf_sire.herd_id as calf_sire_id,
col_def.COLOR_DESC as calf_color,
char((select max(act_recov_date)
from $schema.flushes
WHERE donor_bhid=an1.bhid), usa)
AS last_flush_date,
char(proj_mate.BREED_TO_DATE, usa)as proj_breed_date,
proj_mate_detail.herd_id as proj_breed_mate

from $schema.animals an1
left outer
join $schema.ANIMALS_PRIV ap_donor
on ap_donor.bhid = an1.BHID
and ap_donor.herd_owner_id = $empid
left outer
join $schema.animals_priv ap_sire
on ap_sire.bhid = an1.SIRE_BHID
and ap_sire.herd_owner_id = $empid
left outer
join $schema.ANIMALS_PRIV ap_dam
on ap_dam.bhid = an1.DAM_BHID
and ap_dam.herd_owner_id = $empid
left outer
join $schema.locations_hist lh
on an1.bhid = lh.bhid
and lh.end_date is null
left outer
join $schema.locations loc
on loc.loc_id = lh.loc_id

-- The problem is here

left outer
join $schema.animals an_calf
on days(current_date) -
days(an_calf.birth_date) < 280
and an_calf.ACTIVEX = 'Y'
and ((an_calf.dam_bhid = an1.bhid AND
an_calf.recip_bhid = 0)
OR an_calf.RECIP_BHID = an1.bhid)

-- end problem

left outer
join $schema.animals_priv ap_calf
on ap_calf.HERD_OWNER_ID = $empid
and ap_calf.bhid = an_calf.bhid
left outer
join $schema.weaning wn
on wn.bhid = an_calf.bhid
left outer
join $schema.ANIMALS_PRIV ap_calf_sire
on an_calf.SIRE_BHID=ap_calf_sire.bhid
and ap_calf_sire.herd_owner_id = $empid
left outer
join $schema.COLOR_DEFN col_def
on col_def.BH_COLOR_CODE=an_calf.COLOR
left outer
join $schema.ANIMALS_PRIV_$empid proj_mate
on proj_mate.bhid = an1.bhid
left outer
join $schema.ANIMALS_PRIV_$empid proj_mate_detail
on proj_mate_detail.BHID=
proj_mate.BREED_TO_BHID
where
an1.BHID in
(select bhid
from $schema.animal_sets
where set_name = '$setname'
and userid='$setuser')
Apr 11 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Bob Stearns wrote:
left outer
join $schema.animals an_calf
on days(current_date) -
days(an_calf.birth_date) < 280
and an_calf.ACTIVEX = 'Y'
and ((an_calf.dam_bhid = an1.bhid AND
an_calf.recip_bhid = 0)
OR an_calf.RECIP_BHID = an1.bhid)

There are two potential problems. The first is the OR which pretty much
makes teh last AND predicate useless for index exploitation.
The second is the expression on days which disallows index exploitation
on birth_date and, worse, forces an NLJOIN.
What about a small rewrite:
an_calf.birthdate > current date - 280 days
Now DB2 has a clear startkey for an index scan over birth date.

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Apr 12 '06 #2

P: n/a
Serge Rielau wrote:
Bob Stearns wrote:
left outer
join $schema.animals an_calf
on days(current_date) -
days(an_calf.birth_date) < 280
and an_calf.ACTIVEX = 'Y'
and ((an_calf.dam_bhid = an1.bhid AND
an_calf.recip_bhid = 0)
OR an_calf.RECIP_BHID = an1.bhid)


There are two potential problems. The first is the OR which pretty much
makes teh last AND predicate useless for index exploitation.
The second is the expression on days which disallows index exploitation
on birth_date and, worse, forces an NLJOIN.
What about a small rewrite:
an_calf.birthdate > current date - 280 days
Now DB2 has a clear startkey for an index scan over birth date.

Cheers
Serge

Thank you. Creating the index and changing the query as you indicated
changed the time from over a minute to less than 2 seconds.

How could I convince the optimizer to do what I would do in "procedural
code", which uses the dam_bhid and recip_bhid indices to minimize the
time required (in no actual language that I know of):
found = -1
while (SELECT * FROM animals WHERE dam_bhid=an1.bhid) {
if(recip_bhid=0} continue
if(activex<>'Y') continue
if(birth_date < current_date-280 days) continue
found = bhid
break
}
if(found=-1) {
while (SELECT * FROM animals WHERE recip_bhid=an1.bhid) {
if(activex<>'Y') continue
if(birth_date < current_date-280 days) continue
found = bhid
break
}
}
if(found=-1) {nomatch)
else (matched the current record, with bhid=found}
Apr 12 '06 #3

P: n/a
Bob Stearns wrote:
Serge Rielau wrote:
Bob Stearns wrote:
left outer
join $schema.animals an_calf
on days(current_date) -
days(an_calf.birth_date) < 280
and an_calf.ACTIVEX = 'Y'
and ((an_calf.dam_bhid = an1.bhid AND
an_calf.recip_bhid = 0)
OR an_calf.RECIP_BHID = an1.bhid)
There are two potential problems. The first is the OR which pretty
much makes teh last AND predicate useless for index exploitation.
The second is the expression on days which disallows index
exploitation on birth_date and, worse, forces an NLJOIN.
What about a small rewrite:
an_calf.birthdate > current date - 280 days
Now DB2 has a clear startkey for an index scan over birth date.

Cheers
Serge

Thank you. Creating the index and changing the query as you indicated
changed the time from over a minute to less than 2 seconds.

How could I convince the optimizer to do what I would do in "procedural
code", which uses the dam_bhid and recip_bhid indices to minimize the
time required (in no actual language that I know of):


CREATE FUNCTION purplecow(...) RETURNS VARCHAR(10)
READS SQL DATA
BEGIN ATOMIC found = -1 DECLARE found INTEGER; while (SELECT * FROM animals WHERE dam_bhid=an1.bhid) {
if(recip_bhid=0} continue
if(activex<>'Y') continue
if(birth_date < current_date-280 days) continue
found = bhid
break
} SET found = (SELECT bhid FROM animals
WHERE dam_bhid=an1.bhid
AND recip_bhid <> 0
AND activex='Y'
AND birth_date >= current_date-280 days
FETCH FIRST ROW ONLY) if(found=-1) {
while (SELECT * FROM animals WHERE recip_bhid=an1.bhid) {
if(activex<>'Y') continue
if(birth_date < current_date-280 days) continue
found = bhid
break
}
} SET found = (SELECT bhid FROM animals
WHERE recip_bhid=an1.bhid
AND activex = 'Y'
AND birth_date >= current_date - 280 days
AND found IS NULL
FETCH FIRST ROW ONLY); if(found=-1) {nomatch)
else (matched the current record, with bhid=found}

RETURN CASE WHEN found IS NULL THEN 'No match'
ELSE CHAR(found) END;
END
@

VALUES purplecow();

If you want to take the optimizer out of the equation SQL Functions
rock. Need to be careful with DPF, but in serial they contribute their
fair share to our TPC-C results.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Apr 12 '06 #4

P: n/a
Serge Rielau wrote:

CREATE FUNCTION purplecow(...) RETURNS VARCHAR(10)
READS SQL DATA
BEGIN ATOMIC

DECLARE found INTEGER;

SET found = (SELECT bhid FROM animals
WHERE dam_bhid=an1.bhid
AND recip_bhid <> 0
AND activex='Y'
AND birth_date >= current_date-280 days
FETCH FIRST ROW ONLY)

SET found = (SELECT bhid FROM animals
WHERE recip_bhid=an1.bhid
AND activex = 'Y'
AND birth_date >= current_date - 280 days
AND found IS NULL
FETCH FIRST ROW ONLY);

RETURN CASE WHEN found IS NULL THEN 'No match'
ELSE CHAR(found) END;
END
@

VALUES purplecow();

If you want to take the optimizer out of the equation SQL Functions
rock. Need to be careful with DPF, but in serial they contribute their
fair share to our TPC-C results.

Cheers
Serge

Could I easily return the entire row found rather than its key? It would
dave 1 join. How do I use it in the middle of a long JOIN sequence?

from an1
..
..
..
left outer join values purplecow(an1.bhid)
as tx(bhid)
on tx.bhid is not null
left outer join animals an_calf
on an_calf.bhid=tx.bhid
..
..
..

is my best guess, but I'm sure that's not quite right
Apr 13 '06 #5

P: n/a
Bob Stearns wrote:
Serge Rielau wrote:

CREATE FUNCTION purplecow(...) RETURNS VARCHAR(10)
READS SQL DATA
BEGIN ATOMIC

DECLARE found INTEGER;

SET found = (SELECT bhid FROM animals
WHERE dam_bhid=an1.bhid
AND recip_bhid <> 0
AND activex='Y'
AND birth_date >= current_date-280 days
FETCH FIRST ROW ONLY)

SET found = (SELECT bhid FROM animals
WHERE recip_bhid=an1.bhid
AND activex = 'Y'
AND birth_date >= current_date - 280 days
AND found IS NULL
FETCH FIRST ROW ONLY);

RETURN CASE WHEN found IS NULL THEN 'No match'
ELSE CHAR(found) END;
END
@

VALUES purplecow();

If you want to take the optimizer out of the equation SQL Functions
rock. Need to be careful with DPF, but in serial they contribute their
fair share to our TPC-C results.

Cheers
Serge

Could I easily return the entire row found rather than its key? It would
dave 1 join. How do I use it in the middle of a long JOIN sequence?

from an1
.
.
.
left outer join values purplecow(an1.bhid)
as tx(bhid)
on tx.bhid is not null
left outer join animals an_calf
on an_calf.bhid=tx.bhid
.
.
.

is my best guess, but I'm sure that's not quite right

If you want to return the entire row you sue a TABLE function.
Simply change the function to:
RETURNS TABLE (pk INT, C1 INT, ...)

and the return to:
RETURN VALUES (......);
...
of course you will also have to extend the SET statements
SET (...) = (SELECT ....)

take a look here for examples:
http://www-128.ibm.com/developerwork...dm-0411rielau/

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Apr 13 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.