Optimizer bad with sparse keys?

Peter Arrenbrecht

Hi all

We ran into a very annoying optimizer problem recently. We had added a
new key and self-join relation to a large production table. That key
will be filled very rarely and having just being added was practically
never filled in when the first user tried to delete a row from the table.

Now, the optimizer tried to enforce RI on the new relation. But instead
of using the index generated by the relation, it used a table scan!

Why? Here's IBM's answer (condensed):

----------------------------------------------------------------------
From what I have been seeing, this behavior is not a bug. The optimizer
is working as designed. The problem come from the fact that the new key
column does not have sufficent distinct values in the table. When
deciding if the index generated by the foreign key constraint
should be used or not, the optimizer calculates the filter factor of
that column. The filter factor is calculated by dividing the
column cardinality by number of rows in table (in your testcase,
1/1356534 which is close to 0). For an index to be considered useful by
DB2, this filter factor has be closer to 1. This is why optimizer
discarded that index as being usefull and chose a table scan. The best
way to circumvent the poor filter factor, would be to declare the table
as volatile (ALTER TABLE UMBNT.DOSSIERPOS VOLATILE CARDINALITY).
-------------------------------------------------------------------------------

I find this very unsatisfactory. During RI checking, and generally when
following relations, DB2 should know that it will never have to seek
NULL values in its indexes. Given detailed runstats, it also knows the
most frequent values and their cardinality for an index. So it should
know that the filter factor for this index, *excluding NULLs*, is
practically 1, not 0.

I don't think the scenario I have described is so rare. What's more, is
forces us to either drop the RI, or use the VOLATILE flag, which I'd
hate to do for a large table in a productive system.

Finally, it is a sort of behaviour that can break a running system just
because a key was added that no one uses yet anyway!

Am I alone in thinking that declaring this behaviour "as designed" is an
unsatisfactory stance?

Regards
Peter Arrenbrecht
Opus Software AG

Nov 12 '05 #1

Subscribe Post Reply

2070

Serge Rielau

Peter,

I'm not sure I agree with the answer you got from support. Can you pass
along the PMR# and we take this conversation offline (email)?
Certainly VOLATILE is one big hammer that shouldn't be used lightly.

Cheers
Serge

Nov 12 '05 #2

We ran into a very annoying optimizer problem recently. We had added a
new key and self-join relation to a large production table. That key
will be filled very rarely and having just being added was practically
never filled in when the first user tried to delete a row from the table.

if the problem persists, I would create a separaty entity, for
instance, instead of having MANAGER_ID in EMPLOYEE table
CREATE TABLE EMPLOYEE(
EMPLOYEE_ID ... PRIMARY KEY,
IS_TEMPORARY ...,/* 0 - PERMANENT, 1 - TEMPORARY */
PERMANENT_EMPLOYEE_ID .../* ONLY POPULATED FOR TEMPORARY EMPLOYEES
REPLACING PERMANENT ONES */
.... /* other columns */
)

i would have a separate table for the self-join:

CREATE TABLE PERMANENT_EMPLOYEE(
EMPLOYEE_ID ... FOREIGN KEY REFERENCES EMPLOYEE(EMPLOYEE_ID),
PERMANENT_EMPLOYEE_ID ... FOREIGN KEY REFERENCES
EMPLOYEE(EMPLOYEE_ID),
PRIMARY KEY(EMPLOYEE_ID)
/* NO OTHER COLUMNS */
)

that would give the optimizer a better idea if the percentage of
temporary employees is small

Nov 12 '05 #3

Peter Arrenbrecht

Thanks for the idea. If all else fails, it's good to have at least one
more thing to try ;)

peo

AK wrote:

We ran into a very annoying optimizer problem recently. We had added a
new key and self-join relation to a large production table. That key
will be filled very rarely and having just being added was practically
never filled in when the first user tried to delete a row from the table.

if the problem persists, I would create a separaty entity, for
instance, instead of having MANAGER_ID in EMPLOYEE table
CREATE TABLE EMPLOYEE(
EMPLOYEE_ID ... PRIMARY KEY,
IS_TEMPORARY ...,/* 0 - PERMANENT, 1 - TEMPORARY */
PERMANENT_EMPLOYEE_ID .../* ONLY POPULATED FOR TEMPORARY EMPLOYEES
REPLACING PERMANENT ONES */
... /* other columns */
)

i would have a separate table for the self-join:

CREATE TABLE PERMANENT_EMPLOYEE(
EMPLOYEE_ID ... FOREIGN KEY REFERENCES EMPLOYEE(EMPLOYEE_ID),
PERMANENT_EMPLOYEE_ID ... FOREIGN KEY REFERENCES
EMPLOYEE(EMPLOYEE_ID),
PRIMARY KEY(EMPLOYEE_ID)
/* NO OTHER COLUMNS */
)

that would give the optimizer a better idea if the percentage of
temporary employees is small

Nov 12 '05 #4

Similar topics

Iterating sparse arrays

by: Christopher Benson-Manica | last post by:

If an array is sparse, say something like var foo=; foo=4; foo='baz'; foo='moo'; is there a way to iterate through the entire array? --

Javascript

Optimizer picks an ineffient plan

by: Bupp Phillips | last post by:

I have a customer table that has the field CUSTOMER_ID as the primary key (cust_pkkey), the table has 102,834 records in it. The following select statement works fine: select * from customer...

PostgreSQL Database

Variables in standard modules, single record entry wiht foreign keys

by: Ross A. Finlayson | last post by:

Hi, I'm scratching together an Access database. The development box is Office 95, the deployment box Office 2003. So anyways I am griping about forms and global variables. Say for example...

Microsoft Access / VBA

Compound keys and foreign constraints

by: wespvp | last post by:

I am using PostgreSQL 7.4.1 on RedHat 7.2. The query I am executing is something like (I replaced all the return values with 'count'): db=> explain select count(*) from messages m join (select...

PostgreSQL Database

SQL Optimizer

by: frenk_mo | last post by:

In a cobol program I have an SQL instruction like following: EXEC SQL DECLARE CURSORE_1 CURSOR FOR SELECT CAMPO1, CAMPO2, CAMPO3, CAMPO4, CAMPO5, CAMPO6, CAMPO7,

DB2 Database

Sparse Matrix implemented as a doubly-linked list

by: adam.kleinbaum | last post by:

Hi there, I'm a novice C programmer working with a series of large (30,000 x 30,000) sparse matrices on a Linux system using the GCC compiler. To represent and store these matrices, I'd like to...

C / C++

Changing map keys (no reordering of keys)

by: alan | last post by:

Hello world, I currently have implemented a sparse array needed by a class as a map<int, boost::shared_ptr<my_type. However, one of the derived classes needs to periodically "tighten" the...

C / C++

Optimization: Picking random keys from a dictionary and mutatingvalues

by: blaine | last post by:

Hey everyone, Just a friendly question about an efficient way to do this. I have a graph with nodes and edges (networkx is am amazing library, check it out!). I also have a lookup table with...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing