473,387 Members | 1,502 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Optimizer bad with sparse keys?

Hi all

We ran into a very annoying optimizer problem recently. We had added a
new key and self-join relation to a large production table. That key
will be filled very rarely and having just being added was practically
never filled in when the first user tried to delete a row from the table.

Now, the optimizer tried to enforce RI on the new relation. But instead
of using the index generated by the relation, it used a table scan!

Why? Here's IBM's answer (condensed):

----------------------------------------------------------------------
From what I have been seeing, this behavior is not a bug. The optimizer
is working as designed. The problem come from the fact that the new key
column does not have sufficent distinct values in the table. When
deciding if the index generated by the foreign key constraint
should be used or not, the optimizer calculates the filter factor of
that column. The filter factor is calculated by dividing the
column cardinality by number of rows in table (in your testcase,
1/1356534 which is close to 0). For an index to be considered useful by
DB2, this filter factor has be closer to 1. This is why optimizer
discarded that index as being usefull and chose a table scan. The best
way to circumvent the poor filter factor, would be to declare the table
as volatile (ALTER TABLE UMBNT.DOSSIERPOS VOLATILE CARDINALITY).
-------------------------------------------------------------------------------

I find this very unsatisfactory. During RI checking, and generally when
following relations, DB2 should know that it will never have to seek
NULL values in its indexes. Given detailed runstats, it also knows the
most frequent values and their cardinality for an index. So it should
know that the filter factor for this index, *excluding NULLs*, is
practically 1, not 0.

I don't think the scenario I have described is so rare. What's more, is
forces us to either drop the RI, or use the VOLATILE flag, which I'd
hate to do for a large table in a productive system.

Finally, it is a sort of behaviour that can break a running system just
because a key was added that no one uses yet anyway!

Am I alone in thinking that declaring this behaviour "as designed" is an
unsatisfactory stance?

Regards
Peter Arrenbrecht
Opus Software AG
Nov 12 '05 #1
3 2070
Peter,

I'm not sure I agree with the answer you got from support. Can you pass
along the PMR# and we take this conversation offline (email)?
Certainly VOLATILE is one big hammer that shouldn't be used lightly.

Cheers
Serge
Nov 12 '05 #2
AK
>
We ran into a very annoying optimizer problem recently. We had added a
new key and self-join relation to a large production table. That key
will be filled very rarely and having just being added was practically
never filled in when the first user tried to delete a row from the table.


if the problem persists, I would create a separaty entity, for
instance, instead of having MANAGER_ID in EMPLOYEE table
CREATE TABLE EMPLOYEE(
EMPLOYEE_ID ... PRIMARY KEY,
IS_TEMPORARY ...,/* 0 - PERMANENT, 1 - TEMPORARY */
PERMANENT_EMPLOYEE_ID .../* ONLY POPULATED FOR TEMPORARY EMPLOYEES
REPLACING PERMANENT ONES */
.... /* other columns */
)

i would have a separate table for the self-join:

CREATE TABLE PERMANENT_EMPLOYEE(
EMPLOYEE_ID ... FOREIGN KEY REFERENCES EMPLOYEE(EMPLOYEE_ID),
PERMANENT_EMPLOYEE_ID ... FOREIGN KEY REFERENCES
EMPLOYEE(EMPLOYEE_ID),
PRIMARY KEY(EMPLOYEE_ID)
/* NO OTHER COLUMNS */
)

that would give the optimizer a better idea if the percentage of
temporary employees is small
Nov 12 '05 #3
Thanks for the idea. If all else fails, it's good to have at least one
more thing to try ;)

peo

AK wrote:
We ran into a very annoying optimizer problem recently. We had added a
new key and self-join relation to a large production table. That key
will be filled very rarely and having just being added was practically
never filled in when the first user tried to delete a row from the table.

if the problem persists, I would create a separaty entity, for
instance, instead of having MANAGER_ID in EMPLOYEE table
CREATE TABLE EMPLOYEE(
EMPLOYEE_ID ... PRIMARY KEY,
IS_TEMPORARY ...,/* 0 - PERMANENT, 1 - TEMPORARY */
PERMANENT_EMPLOYEE_ID .../* ONLY POPULATED FOR TEMPORARY EMPLOYEES
REPLACING PERMANENT ONES */
... /* other columns */
)

i would have a separate table for the self-join:

CREATE TABLE PERMANENT_EMPLOYEE(
EMPLOYEE_ID ... FOREIGN KEY REFERENCES EMPLOYEE(EMPLOYEE_ID),
PERMANENT_EMPLOYEE_ID ... FOREIGN KEY REFERENCES
EMPLOYEE(EMPLOYEE_ID),
PRIMARY KEY(EMPLOYEE_ID)
/* NO OTHER COLUMNS */
)

that would give the optimizer a better idea if the percentage of
temporary employees is small

Nov 12 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

34
by: Christopher Benson-Manica | last post by:
If an array is sparse, say something like var foo=; foo=4; foo='baz'; foo='moo'; is there a way to iterate through the entire array? --
14
by: Bupp Phillips | last post by:
I have a customer table that has the field CUSTOMER_ID as the primary key (cust_pkkey), the table has 102,834 records in it. The following select statement works fine: select * from customer...
5
by: Ross A. Finlayson | last post by:
Hi, I'm scratching together an Access database. The development box is Office 95, the deployment box Office 2003. So anyways I am griping about forms and global variables. Say for example...
8
by: wespvp | last post by:
I am using PostgreSQL 7.4.1 on RedHat 7.2. The query I am executing is something like (I replaced all the return values with 'count'): db=> explain select count(*) from messages m join (select...
4
by: frenk_mo | last post by:
In a cobol program I have an SQL instruction like following: EXEC SQL DECLARE CURSORE_1 CURSOR FOR SELECT CAMPO1, CAMPO2, CAMPO3, CAMPO4, CAMPO5, CAMPO6, CAMPO7,
5
by: adam.kleinbaum | last post by:
Hi there, I'm a novice C programmer working with a series of large (30,000 x 30,000) sparse matrices on a Linux system using the GCC compiler. To represent and store these matrices, I'd like to...
3
by: alan | last post by:
Hello world, I currently have implemented a sparse array needed by a class as a map<int, boost::shared_ptr<my_type. However, one of the derived classes needs to periodically "tighten" the...
2
by: blaine | last post by:
Hey everyone, Just a friendly question about an efficient way to do this. I have a graph with nodes and edges (networkx is am amazing library, check it out!). I also have a lookup table with...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.