By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,756 Members | 1,721 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,756 IT Pros & Developers. It's quick & easy.

abysmal performance with scalar query

P: n/a
When I run the following query with the two sections commented out, the
response time is between 1 an 2 seconds; with the first indicated
section enabled, the response goes up to 15 seconds even though
t1.bh_disposal_code IS NOT NULL in only one row; with the second section
enabled, it goes up to 592 seconds, even though t1.mating is NULL in all
the rows chosen. Why should the (supposedly never executed) scalar
subqueries cost so much? It is as if the CASE statements were being
evaluated on every row in t1 rather than just the selected ones. Both of
these scalar subqueries are here because the are infrequent and the
corresponding join is difficult or impossible.

SELECT t2p.herd_id AS dam_tag,
is3.nalf_prefix(t2.tattoo) AS dam_prefix,
is3.nalf_id(t2.tattoo) AS dam_id,
upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
case when t2.breed_1='AN' AND t2.pct_1=100
THEN '11111111' end AS dam_breed,
char(t2.birth_date,USA) AS dam_birth_date,

// section 1
case
when t1.bh_disposal_code is null then ''
when t1.bh_disposal_code=0 then ''
else (select disposal_code
from is3.disposal_assoc_defn tx
where assoc_code='LM'
and tx.bh_disposal_code= t2.bh_disposal_code)
END AS dam_stat,
// end section 1

rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
case when t1.mating='A' then 'Y' end AS ai,

// section 2
case when t1.mating='A' then
(SELECT t2a.datex FROM is3.service_ai t2a
WHERE t2a.cow_bhid=t1.dam_bhid
AND t2a.datex =
(SELECT MAX(datex) FROM is3.service_ai
WHERE days(datex) < (days(t1.birth_date)-283+45)
AND days(datex) > (days(t1.birth_date)-283-45) )
)
end AS breed_date,
// end section 2

t1p.herd_id AS tag,
t1.birth_date,
t1b.birth_mgmt AS birth_grp,
case when t1.sex='C' then 'H' else t1.sex end AS sex,
translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
t1.hps,
t1c.color_desc AS color,
t1b.birth_weight AS weight,
t1.namex AS name,
rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
FROM is3.animals t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets
WHERE set_name='Lm_ET' AND userid='jhough')
ORDER BY t1.bhid
Nov 12 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Bob Stearns wrote:
When I run the following query with the two sections commented out, the
response time is between 1 an 2 seconds; with the first indicated
section enabled, the response goes up to 15 seconds even though
t1.bh_disposal_code IS NOT NULL in only one row; with the second section
enabled, it goes up to 592 seconds, even though t1.mating is NULL in all
the rows chosen. Why should the (supposedly never executed) scalar
subqueries cost so much? It is as if the CASE statements were being
evaluated on every row in t1 rather than just the selected ones. Both of
these scalar subqueries are here because the are infrequent and the
corresponding join is difficult or impossible.

SELECT t2p.herd_id AS dam_tag,
is3.nalf_prefix(t2.tattoo) AS dam_prefix,
is3.nalf_id(t2.tattoo) AS dam_id,
upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
case when t2.breed_1='AN' AND t2.pct_1=100
THEN '11111111' end AS dam_breed,
char(t2.birth_date,USA) AS dam_birth_date,

// section 1
case
when t1.bh_disposal_code is null then ''
when t1.bh_disposal_code=0 then ''
else (select disposal_code
from is3.disposal_assoc_defn tx
where assoc_code='LM'
and tx.bh_disposal_code= t2.bh_disposal_code)
END AS dam_stat,
// end section 1

rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
case when t1.mating='A' then 'Y' end AS ai,

// section 2
case when t1.mating='A' then
(SELECT t2a.datex FROM is3.service_ai t2a
WHERE t2a.cow_bhid=t1.dam_bhid
AND t2a.datex =
(SELECT MAX(datex) FROM is3.service_ai
WHERE days(datex) < (days(t1.birth_date)-283+45)
AND days(datex) > (days(t1.birth_date)-283-45) )
)
end AS breed_date,
// end section 2

t1p.herd_id AS tag,
t1.birth_date,
t1b.birth_mgmt AS birth_grp,
case when t1.sex='C' then 'H' else t1.sex end AS sex,
translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
t1.hps,
t1c.color_desc AS color,
t1b.birth_weight AS weight,
t1.namex AS name,
rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
FROM is3.animals t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets
WHERE set_name='Lm_ET' AND userid='jhough')
ORDER BY t1.bhid

Your observation is correct! DB2 loves to turn subqueries to joins.
It is pretty oblivious to CASE expression.
The easiest fix is to push/replicate the WHEN predicate into the WHERE
clause of the scalar subquery.
DB2 is smart enough to realize that the predicate is outside of the
context of the subquery and compute it before going all out for the
subquery.

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 12 '05 #2

P: n/a
Serge Rielau wrote:
Bob Stearns wrote:
When I run the following query with the two sections commented out,
the response time is between 1 an 2 seconds; with the first indicated
section enabled, the response goes up to 15 seconds even though
t1.bh_disposal_code IS NOT NULL in only one row; with the second
section enabled, it goes up to 592 seconds, even though t1.mating is
NULL in all the rows chosen. Why should the (supposedly never
executed) scalar subqueries cost so much? It is as if the CASE
statements were being evaluated on every row in t1 rather than just
the selected ones. Both of these scalar subqueries are here because
the are infrequent and the corresponding join is difficult or impossible.

SELECT t2p.herd_id AS dam_tag,
is3.nalf_prefix(t2.tattoo) AS dam_prefix,
is3.nalf_id(t2.tattoo) AS dam_id,
upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
case when t2.breed_1='AN' AND t2.pct_1=100
THEN '11111111' end AS dam_breed,
char(t2.birth_date,USA) AS dam_birth_date,

// section 1
case
when t1.bh_disposal_code is null then ''
when t1.bh_disposal_code=0 then ''
else (select disposal_code
from is3.disposal_assoc_defn tx
where assoc_code='LM'
and tx.bh_disposal_code= t2.bh_disposal_code)
END AS dam_stat,
// end section 1

rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
case when t1.mating='A' then 'Y' end AS ai,

// section 2
case when t1.mating='A' then
(SELECT t2a.datex FROM is3.service_ai t2a
WHERE t2a.cow_bhid=t1.dam_bhid
AND t2a.datex =
(SELECT MAX(datex) FROM is3.service_ai
WHERE days(datex) < (days(t1.birth_date)-283+45)
AND days(datex) > (days(t1.birth_date)-283-45) )
)
end AS breed_date,
// end section 2

t1p.herd_id AS tag,
t1.birth_date,
t1b.birth_mgmt AS birth_grp,
case when t1.sex='C' then 'H' else t1.sex end AS sex,
translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
t1.hps,
t1c.color_desc AS color,
t1b.birth_weight AS weight,
t1.namex AS name,
rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
FROM is3.animals t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets
WHERE set_name='Lm_ET' AND userid='jhough')
ORDER BY t1.bhid


Your observation is correct! DB2 loves to turn subqueries to joins.
It is pretty oblivious to CASE expression.
The easiest fix is to push/replicate the WHEN predicate into the WHERE
clause of the scalar subquery.
DB2 is smart enough to realize that the predicate is outside of the
context of the subquery and compute it before going all out for the
subquery.

Cheers
Serge


Thanks for the quick reply (when do you sleep:-).

I changed to the syntax below and the time is now 39 seconds. That is a
great improvement over 592 seconds, but is much slower than the 1-2
seconds to do everything but these two fields. I get the impression that
the subselects are still being performed for each row of t1 rather than
the rows selected by the JOINS.

Should I be looking at temporary tables or dedicated functions?

SELECT t2p.herd_id AS dam_tag,
is3.nalf_prefix(t2.tattoo) AS dam_prefix,
is3.nalf_id(t2.tattoo) AS dam_id,
upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
case when t2.breed_1='AN' AND t2.pct_1=100 THEN '11111111' end
AS dam_breed,
char(t2.birth_date,USA) AS dam_birth_date,
(select disposal_code
from is3.disposal_assoc_defn tx
where assoc_code='LM'
and tx.bh_disposal_code= t2.bh_disposal_code
and ( (t1.bh_disposal_code is not null)
AND
(t1.bh_disposal_code <> 0)
)
) AS dam_stat,
rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
case when t1.mating='A' then 'Y' end AS ai,
(SELECT t2a.datex FROM is3.service_ai t2a
WHERE t2a.cow_bhid=t1.dam_bhid
AND t2a.datex = (SELECT MAX(datex) FROM is3.service_ai
WHERE days(datex) <
(days(t1.birth_date)-283+45)
AND days(datex) >
(days(t1.birth_date)-283-45) )
AND t1.mating='A'
) AS breed_date,
t1p.herd_id AS tag,
t1.birth_date,
t1b.birth_mgmt AS birth_grp,
case when t1.sex='C' then 'H' else t1.sex end AS sex,
translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
t1.hps,
t1c.color_desc AS color,
t1b.birth_weight AS weight,
t1.namex AS name,
rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
FROM is3.animals t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets WHERE
set_name='Lm_ET' AND userid='jhough')
ORDER BY t1.bhid
go
Nov 12 '05 #3

P: n/a
Bob Stearns wrote:
Serge Rielau wrote:
Bob Stearns wrote:
When I run the following query with the two sections commented out,
the response time is between 1 an 2 seconds; with the first indicated
section enabled, the response goes up to 15 seconds even though
t1.bh_disposal_code IS NOT NULL in only one row; with the second
section enabled, it goes up to 592 seconds, even though t1.mating is
NULL in all the rows chosen. Why should the (supposedly never
executed) scalar subqueries cost so much? It is as if the CASE
statements were being evaluated on every row in t1 rather than just
the selected ones. Both of these scalar subqueries are here because
the are infrequent and the corresponding join is difficult or
impossible.

SELECT t2p.herd_id AS dam_tag,
is3.nalf_prefix(t2.tattoo) AS dam_prefix,
is3.nalf_id(t2.tattoo) AS dam_id,
upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
case when t2.breed_1='AN' AND t2.pct_1=100
THEN '11111111' end AS dam_breed,
char(t2.birth_date,USA) AS dam_birth_date,

// section 1
case
when t1.bh_disposal_code is null then ''
when t1.bh_disposal_code=0 then ''
else (select disposal_code
from is3.disposal_assoc_defn tx
where assoc_code='LM'
and tx.bh_disposal_code= t2.bh_disposal_code)
END AS dam_stat,
// end section 1

rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
case when t1.mating='A' then 'Y' end AS ai,

// section 2
case when t1.mating='A' then
(SELECT t2a.datex FROM is3.service_ai t2a
WHERE t2a.cow_bhid=t1.dam_bhid
AND t2a.datex =
(SELECT MAX(datex) FROM is3.service_ai
WHERE days(datex) < (days(t1.birth_date)-283+45)
AND days(datex) > (days(t1.birth_date)-283-45) )
)
end AS breed_date,
// end section 2

t1p.herd_id AS tag,
t1.birth_date,
t1b.birth_mgmt AS birth_grp,
case when t1.sex='C' then 'H' else t1.sex end AS sex,
translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
t1.hps,
t1c.color_desc AS color,
t1b.birth_weight AS weight,
t1.namex AS name,
rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
FROM is3.animals t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets
WHERE set_name='Lm_ET' AND userid='jhough')
ORDER BY t1.bhid

Your observation is correct! DB2 loves to turn subqueries to joins.
It is pretty oblivious to CASE expression.
The easiest fix is to push/replicate the WHEN predicate into the WHERE
clause of the scalar subquery.
DB2 is smart enough to realize that the predicate is outside of the
context of the subquery and compute it before going all out for the
subquery.

Cheers
Serge

Thanks for the quick reply (when do you sleep:-).

I changed to the syntax below and the time is now 39 seconds. That is a
great improvement over 592 seconds, but is much slower than the 1-2
seconds to do everything but these two fields. I get the impression that
the subselects are still being performed for each row of t1 rather than
the rows selected by the JOINS.

Should I be looking at temporary tables or dedicated functions?

SELECT t2p.herd_id AS dam_tag,
is3.nalf_prefix(t2.tattoo) AS dam_prefix,
is3.nalf_id(t2.tattoo) AS dam_id,
upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
case when t2.breed_1='AN' AND t2.pct_1=100 THEN '11111111' end AS
dam_breed,
char(t2.birth_date,USA) AS dam_birth_date,
(select disposal_code
from is3.disposal_assoc_defn tx
where assoc_code='LM'
and tx.bh_disposal_code= t2.bh_disposal_code

-- small tweak here and ( (t1.bh_disposal_code is not null)
AND
(t1.bh_disposal_code <> 0)
) AND (t1.bh_disposal_code <> 0) is good enough because: (NULL <> 0) is
UNKNOWN != TRUE ) AS dam_stat,
rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
case when t1.mating='A' then 'Y' end AS ai,
(SELECT t2a.datex FROM is3.service_ai t2a
WHERE t2a.cow_bhid=t1.dam_bhid -- AND t2a.datex = (SELECT MAX(datex) FROM is3.service_ai
WHERE days(datex) <
(days(t1.birth_date)-283+45)
AND days(datex) >
(days(t1.birth_date)-283-45) ) AND t2a.datex = (SELECT MAX(datex) FROM is3.service_ai
WHERE datex < t1.birth_date - 283 days + 45 days
AND datex > t1.birt_date - 283 days + 45 days
AND t1.mating = 'A' ) AND t1.mating='A'
) AS breed_date, -- Or what about only:
(SELECT MAX(datex) FROM is3.service_ai
WHERE datex < t1.birth_date - 283 days + 45 days
AND datex > t1.birt_date - 283 days + 45 days
AND t1.mating = 'A' ) AS breed_date
-- If no row is found the results will still be NULL...
t1p.herd_id AS tag,
t1.birth_date,
t1b.birth_mgmt AS birth_grp,
case when t1.sex='C' then 'H' else t1.sex end AS sex,
translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
t1.hps,
t1c.color_desc AS color,
t1b.birth_weight AS weight,
t1.namex AS name,
rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
FROM is3.animals t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets WHERE
set_name='Lm_ET' AND userid='jhough')
ORDER BY t1.bhid
go


Cheers
Serge

--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 12 '05 #4

P: n/a
Serge Rielau wrote:
Bob Stearns wrote:
Serge Rielau wrote:
Bob Stearns wrote:

When I run the following query with the two sections commented out,
the response time is between 1 an 2 seconds; with the first
indicated section enabled, the response goes up to 15 seconds even
though t1.bh_disposal_code IS NOT NULL in only one row; with the
second section enabled, it goes up to 592 seconds, even though
t1.mating is NULL in all the rows chosen. Why should the (supposedly
never executed) scalar subqueries cost so much? It is as if the CASE
statements were being evaluated on every row in t1 rather than just
the selected ones. Both of these scalar subqueries are here because
the are infrequent and the corresponding join is difficult or
impossible.

SELECT t2p.herd_id AS dam_tag,
is3.nalf_prefix(t2.tattoo) AS dam_prefix,
is3.nalf_id(t2.tattoo) AS dam_id,
upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
case when t2.breed_1='AN' AND t2.pct_1=100
THEN '11111111' end AS dam_breed,
char(t2.birth_date,USA) AS dam_birth_date,

// section 1
case
when t1.bh_disposal_code is null then ''
when t1.bh_disposal_code=0 then ''
else (select disposal_code
from is3.disposal_assoc_defn tx
where assoc_code='LM'
and tx.bh_disposal_code= t2.bh_disposal_code)
END AS dam_stat,
// end section 1

rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
case when t1.mating='A' then 'Y' end AS ai,

// section 2
case when t1.mating='A' then
(SELECT t2a.datex FROM is3.service_ai t2a
WHERE t2a.cow_bhid=t1.dam_bhid
AND t2a.datex =
(SELECT MAX(datex) FROM is3.service_ai
WHERE days(datex) < (days(t1.birth_date)-283+45)
AND days(datex) > (days(t1.birth_date)-283-45) )
)
end AS breed_date,
// end section 2

t1p.herd_id AS tag,
t1.birth_date,
t1b.birth_mgmt AS birth_grp,
case when t1.sex='C' then 'H' else t1.sex end AS sex,
translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
t1.hps,
t1c.color_desc AS color,
t1b.birth_weight AS weight,
t1.namex AS name,
rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
FROM is3.animals t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets
WHERE set_name='Lm_ET' AND userid='jhough')
ORDER BY t1.bhid

Your observation is correct! DB2 loves to turn subqueries to joins.
It is pretty oblivious to CASE expression.
The easiest fix is to push/replicate the WHEN predicate into the
WHERE clause of the scalar subquery.
DB2 is smart enough to realize that the predicate is outside of the
context of the subquery and compute it before going all out for the
subquery.

Cheers
Serge


Thanks for the quick reply (when do you sleep:-).

I changed to the syntax below and the time is now 39 seconds. That is
a great improvement over 592 seconds, but is much slower than the 1-2
seconds to do everything but these two fields. I get the impression
that the subselects are still being performed for each row of t1
rather than the rows selected by the JOINS.

Should I be looking at temporary tables or dedicated functions?

SELECT t2p.herd_id AS dam_tag,
is3.nalf_prefix(t2.tattoo) AS dam_prefix,
is3.nalf_id(t2.tattoo) AS dam_id,
upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
case when t2.breed_1='AN' AND t2.pct_1=100 THEN '11111111' end
AS dam_breed,
char(t2.birth_date,USA) AS dam_birth_date,
(select disposal_code
from is3.disposal_assoc_defn tx
where assoc_code='LM'
and tx.bh_disposal_code= t2.bh_disposal_code


-- small tweak here
and ( (t1.bh_disposal_code is not null)
AND
(t1.bh_disposal_code <> 0)
)


AND (t1.bh_disposal_code <> 0) is good enough because: (NULL <> 0) is
UNKNOWN != TRUE
) AS dam_stat,
rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
case when t1.mating='A' then 'Y' end AS ai,
(SELECT t2a.datex FROM is3.service_ai t2a
WHERE t2a.cow_bhid=t1.dam_bhid


--
AND t2a.datex = (SELECT MAX(datex) FROM is3.service_ai
WHERE days(datex) <
(days(t1.birth_date)-283+45)
AND days(datex) >
(days(t1.birth_date)-283-45) )


AND t2a.datex = (SELECT MAX(datex) FROM is3.service_ai
WHERE datex < t1.birth_date - 283 days + 45 days
AND datex > t1.birt_date - 283 days + 45 days
AND t1.mating = 'A' )
AND t1.mating='A'
) AS breed_date,


-- Or what about only:
(SELECT MAX(datex) FROM is3.service_ai
WHERE datex < t1.birth_date - 283 days + 45 days
AND datex > t1.birt_date - 283 days + 45 days
AND t1.mating = 'A' ) AS breed_date
-- If no row is found the results will still be NULL...
t1p.herd_id AS tag,
t1.birth_date,
t1b.birth_mgmt AS birth_grp,
case when t1.sex='C' then 'H' else t1.sex end AS sex,
translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
t1.hps,
t1c.color_desc AS color,
t1b.birth_weight AS weight,
t1.namex AS name,
rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
FROM is3.animals t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets WHERE
set_name='Lm_ET' AND userid='jhough')
ORDER BY t1.bhid
go

Cheers
Serge

Thanks again. Neither change resulted in a significant change. We were
looking in the wrong place. I, at least, expected the optimizer to be
better than it is. I reached this epiphany while dreaming about this
problem (talk about nightmares). By forcing the most restrictive WHERE
clause to be done first, the time was reduced to 1-2 seconds. I changed
the FROM phrase to:

FROM (SELECT *
FROM is3.animals
WHERE bhid IN (SELECT bhid
FROM is3.animal_sets
WHERE set_name='Junk' AND userid='jhough')
) t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1

Do parentheses force the order of JOINS to be done the parenthesized
way? And why does the optimizer not see this optimization in the
presence of the problematical sections, but does without them?
Nov 12 '05 #5

P: n/a
Bob Stearns wrote:
Serge Rielau wrote:
Bob Stearns wrote:
Serge Rielau wrote:

Bob Stearns wrote:

> When I run the following query with the two sections commented out,
> the response time is between 1 an 2 seconds; with the first
> indicated section enabled, the response goes up to 15 seconds even
> though t1.bh_disposal_code IS NOT NULL in only one row; with the
> second section enabled, it goes up to 592 seconds, even though
> t1.mating is NULL in all the rows chosen. Why should the
> (supposedly never executed) scalar subqueries cost so much? It is
> as if the CASE statements were being evaluated on every row in t1
> rather than just the selected ones. Both of these scalar subqueries
> are here because the are infrequent and the corresponding join is
> difficult or impossible.
>
> SELECT t2p.herd_id AS dam_tag,
> is3.nalf_prefix(t2.tattoo) AS dam_prefix,
> is3.nalf_id(t2.tattoo) AS dam_id,
> upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
> rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
> case when t2.breed_1='AN' AND t2.pct_1=100
> THEN '11111111' end AS dam_breed,
> char(t2.birth_date,USA) AS dam_birth_date,
>
> // section 1
> case
> when t1.bh_disposal_code is null then ''
> when t1.bh_disposal_code=0 then ''
> else (select disposal_code
> from is3.disposal_assoc_defn tx
> where assoc_code='LM'
> and tx.bh_disposal_code= t2.bh_disposal_code)
> END AS dam_stat,
> // end section 1
>
> rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
> case when t1.mating='A' then 'Y' end AS ai,
>
> // section 2
> case when t1.mating='A' then
> (SELECT t2a.datex FROM is3.service_ai t2a
> WHERE t2a.cow_bhid=t1.dam_bhid
> AND t2a.datex =
> (SELECT MAX(datex) FROM is3.service_ai
> WHERE days(datex) < (days(t1.birth_date)-283+45)
> AND days(datex) >
> (days(t1.birth_date)-283-45) )
> )
> end AS breed_date,
> // end section 2
>
> t1p.herd_id AS tag,
> t1.birth_date,
> t1b.birth_mgmt AS birth_grp,
> case when t1.sex='C' then 'H' else t1.sex end AS sex,
> translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
> t1.hps,
> t1c.color_desc AS color,
> t1b.birth_weight AS weight,
> t1.namex AS name,
> rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
> FROM is3.animals t1
> JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
> JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
> JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
> LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
> LEFT OUTER JOIN is3.animals_priv t1p
> ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
> LEFT OUTER JOIN is3.animals_priv t2p
> ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
> WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets
> WHERE set_name='Lm_ET' AND userid='jhough')
> ORDER BY t1.bhid


Your observation is correct! DB2 loves to turn subqueries to joins.
It is pretty oblivious to CASE expression.
The easiest fix is to push/replicate the WHEN predicate into the
WHERE clause of the scalar subquery.
DB2 is smart enough to realize that the predicate is outside of the
context of the subquery and compute it before going all out for the
subquery.

Cheers
Serge


Thanks for the quick reply (when do you sleep:-).

I changed to the syntax below and the time is now 39 seconds. That is
a great improvement over 592 seconds, but is much slower than the 1-2
seconds to do everything but these two fields. I get the impression
that the subselects are still being performed for each row of t1
rather than the rows selected by the JOINS.

Should I be looking at temporary tables or dedicated functions?

SELECT t2p.herd_id AS dam_tag,
is3.nalf_prefix(t2.tattoo) AS dam_prefix,
is3.nalf_id(t2.tattoo) AS dam_id,
upper(right(rtrim(t2.tattoo),1)) AS dam_yr_code,
rtrim(t2.prefix) || '-' || rtrim(t2.regnum) AS dam_regnum,
case when t2.breed_1='AN' AND t2.pct_1=100 THEN '11111111' end
AS dam_breed,
char(t2.birth_date,USA) AS dam_birth_date,
(select disposal_code
from is3.disposal_assoc_defn tx
where assoc_code='LM'
and tx.bh_disposal_code= t2.bh_disposal_code

-- small tweak here
and ( (t1.bh_disposal_code is not null)
AND
(t1.bh_disposal_code <> 0)
)

AND (t1.bh_disposal_code <> 0) is good enough because: (NULL <> 0) is
UNKNOWN != TRUE
) AS dam_stat,
rtrim(t3.prefix) || '-' || rtrim(t3.regnum) AS sire_regnum,
case when t1.mating='A' then 'Y' end AS ai,
(SELECT t2a.datex FROM is3.service_ai t2a
WHERE t2a.cow_bhid=t1.dam_bhid

--
AND t2a.datex = (SELECT MAX(datex) FROM is3.service_ai
WHERE days(datex) <
(days(t1.birth_date)-283+45)
AND days(datex) >
(days(t1.birth_date)-283-45) )

AND t2a.datex = (SELECT MAX(datex) FROM is3.service_ai
WHERE datex < t1.birth_date - 283 days + 45 days
AND datex > t1.birt_date - 283 days + 45 days
AND t1.mating = 'A' )
AND t1.mating='A'
) AS breed_date,

-- Or what about only:
(SELECT MAX(datex) FROM is3.service_ai
WHERE datex < t1.birth_date - 283 days + 45 days
AND datex > t1.birt_date - 283 days + 45 days
AND t1.mating = 'A' ) AS breed_date
-- If no row is found the results will still be NULL...
t1p.herd_id AS tag,
t1.birth_date,
t1b.birth_mgmt AS birth_grp,
case when t1.sex='C' then 'H' else t1.sex end AS sex,
translate(t1b.twin_code,'SSTTTT','NSCBMT') AS typex,
t1.hps,
t1c.color_desc AS color,
t1b.birth_weight AS weight,
t1.namex AS name,
rtrim(t1.prefix) || '-' || rtrim(t1.regnum) AS regnum
FROM is3.animals t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1
WHERE t1.bhid IN (SELECT bhid FROM is3.animal_sets WHERE
set_name='Lm_ET' AND userid='jhough')
ORDER BY t1.bhid
go


Cheers
Serge

Thanks again. Neither change resulted in a significant change. We were
looking in the wrong place. I, at least, expected the optimizer to be
better than it is. I reached this epiphany while dreaming about this
problem (talk about nightmares). By forcing the most restrictive WHERE
clause to be done first, the time was reduced to 1-2 seconds. I changed
the FROM phrase to:

FROM (SELECT *
FROM is3.animals
WHERE bhid IN (SELECT bhid
FROM is3.animal_sets
WHERE set_name='Junk' AND userid='jhough')
) t1
JOIN is3.animals t2 ON t1.dam_bhid=t2.bhid
JOIN is3.animals t3 ON t1.sire_bhid=t3.bhid
JOIN is3.color_defn t1c ON t1.color=t1c.bh_color_code
LEFT OUTER JOIN is3.birth t1b ON t1.bhid=t1b.bhid
LEFT OUTER JOIN is3.animals_priv t1p
ON t1.bhid=t1p.bhid AND t1p.herd_owner_id=1
LEFT OUTER JOIN is3.animals_priv t2p
ON t2.bhid=t2p.bhid AND t2p.herd_owner_id=1

Do parentheses force the order of JOINS to be done the parenthesized
way? And why does the optimizer not see this optimization in the
presence of the problematical sections, but does without them?

You have three kinds of joins in te hmix here:
INNER, LEFT OUTER and LEFT OUTER with enforced inner of cardinality 1
that's your scalar subquery.
That si somewhat trickier than just having INNER and LEFT (without the
enforcement). Not knowing which version or fixpack you are on the
combinations I do know that theer have been improvement in that area in
recent V8 fixpacks.
Now one could point out that Sybase apparently has solved the problem
and shame on DB2's famed optimizer. BUT to the best of my knowledge
Sybase does NOT enforce scslar subqueries to be scalar (returning
_any_one_ row). So teh DBMS oversimplifies the problem.
To get Sybase semantics you would need to use a regular LEFT OUTER JOIN
insetad the scalar subquery and add a FETCH FIRST ROW ONLY clause to the
right. That would put the systems on more levl grounds.

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 12 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.