Hi,
Consider the following query
select *
from t1
where c1 = ?
and c2 = ?
table t1 has a high number of rows where c1 is null, but rows where c1
is not null, the filter factor is relatively low. The table has
distributed statistics collected on table and key columns.
Now, if I have an index i1 on t1(c1), access path produced by db2exfmt
shows the correct filter factor for c1 = ? predicate, presumably
because optimizer knows the fact that predicate c1 = ? cannot be
satisfied by rows which have null values for c1.
However, if I have an index i2 on t1(c1,c2), the explain shows a very
high filter factor for the same predicate. I can tell that it was
derived from fullkeycard of index i2, which includes the null values.
Optimizer completely ignores the fact that predicate c1 = ? cannot
select null values.
The actual query is a little bit complex, involving join of table t1
with 5 other tables, but the incorrect filter factor calculations
causes optimizer to select wrong table as the outer table in case
index i2 exists.
Has any one encountered similar situation before? Any suggestions?
The DB2 manual says runstats command doesn't support collecting
distributed statistics on a column group, but DB2 catalog has
syscat.colgroup% tables, so I am confused, is it the runstats command
limitation or the optimizer limitation? If it is just runstats
limitation, any external ways to populate distributed column group
statistics?
TIA
P. Adhia