By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,924 Members | 1,651 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,924 IT Pros & Developers. It's quick & easy.

Optimizer bad with sparse keys?

P: n/a
Hi all

We ran into a very annoying optimizer problem recently. We had added a
new key and self-join relation to a large production table. That key
will be filled very rarely and having just being added was practically
never filled in when the first user tried to delete a row from the table.

Now, the optimizer tried to enforce RI on the new relation. But instead
of using the index generated by the relation, it used a table scan!

Why? Here's IBM's answer (condensed):

----------------------------------------------------------------------
From what I have been seeing, this behavior is not a bug. The optimizer
is working as designed. The problem come from the fact that the new key
column does not have sufficent distinct values in the table. When
deciding if the index generated by the foreign key constraint
should be used or not, the optimizer calculates the filter factor of
that column. The filter factor is calculated by dividing the
column cardinality by number of rows in table (in your testcase,
1/1356534 which is close to 0). For an index to be considered useful by
DB2, this filter factor has be closer to 1. This is why optimizer
discarded that index as being usefull and chose a table scan. The best
way to circumvent the poor filter factor, would be to declare the table
as volatile (ALTER TABLE UMBNT.DOSSIERPOS VOLATILE CARDINALITY).
-------------------------------------------------------------------------------

I find this very unsatisfactory. During RI checking, and generally when
following relations, DB2 should know that it will never have to seek
NULL values in its indexes. Given detailed runstats, it also knows the
most frequent values and their cardinality for an index. So it should
know that the filter factor for this index, *excluding NULLs*, is
practically 1, not 0.

I don't think the scenario I have described is so rare. What's more, is
forces us to either drop the RI, or use the VOLATILE flag, which I'd
hate to do for a large table in a productive system.

Finally, it is a sort of behaviour that can break a running system just
because a key was added that no one uses yet anyway!

Am I alone in thinking that declaring this behaviour "as designed" is an
unsatisfactory stance?

Regards
Peter Arrenbrecht
Opus Software AG
Nov 12 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Peter,

I'm not sure I agree with the answer you got from support. Can you pass
along the PMR# and we take this conversation offline (email)?
Certainly VOLATILE is one big hammer that shouldn't be used lightly.

Cheers
Serge
Nov 12 '05 #2

P: n/a
AK
>
We ran into a very annoying optimizer problem recently. We had added a
new key and self-join relation to a large production table. That key
will be filled very rarely and having just being added was practically
never filled in when the first user tried to delete a row from the table.


if the problem persists, I would create a separaty entity, for
instance, instead of having MANAGER_ID in EMPLOYEE table
CREATE TABLE EMPLOYEE(
EMPLOYEE_ID ... PRIMARY KEY,
IS_TEMPORARY ...,/* 0 - PERMANENT, 1 - TEMPORARY */
PERMANENT_EMPLOYEE_ID .../* ONLY POPULATED FOR TEMPORARY EMPLOYEES
REPLACING PERMANENT ONES */
.... /* other columns */
)

i would have a separate table for the self-join:

CREATE TABLE PERMANENT_EMPLOYEE(
EMPLOYEE_ID ... FOREIGN KEY REFERENCES EMPLOYEE(EMPLOYEE_ID),
PERMANENT_EMPLOYEE_ID ... FOREIGN KEY REFERENCES
EMPLOYEE(EMPLOYEE_ID),
PRIMARY KEY(EMPLOYEE_ID)
/* NO OTHER COLUMNS */
)

that would give the optimizer a better idea if the percentage of
temporary employees is small
Nov 12 '05 #3

P: n/a
Thanks for the idea. If all else fails, it's good to have at least one
more thing to try ;)

peo

AK wrote:
We ran into a very annoying optimizer problem recently. We had added a
new key and self-join relation to a large production table. That key
will be filled very rarely and having just being added was practically
never filled in when the first user tried to delete a row from the table.

if the problem persists, I would create a separaty entity, for
instance, instead of having MANAGER_ID in EMPLOYEE table
CREATE TABLE EMPLOYEE(
EMPLOYEE_ID ... PRIMARY KEY,
IS_TEMPORARY ...,/* 0 - PERMANENT, 1 - TEMPORARY */
PERMANENT_EMPLOYEE_ID .../* ONLY POPULATED FOR TEMPORARY EMPLOYEES
REPLACING PERMANENT ONES */
... /* other columns */
)

i would have a separate table for the self-join:

CREATE TABLE PERMANENT_EMPLOYEE(
EMPLOYEE_ID ... FOREIGN KEY REFERENCES EMPLOYEE(EMPLOYEE_ID),
PERMANENT_EMPLOYEE_ID ... FOREIGN KEY REFERENCES
EMPLOYEE(EMPLOYEE_ID),
PRIMARY KEY(EMPLOYEE_ID)
/* NO OTHER COLUMNS */
)

that would give the optimizer a better idea if the percentage of
temporary employees is small

Nov 12 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.