473,320 Members | 1,940 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Skip scans in DB2

AK
I tried to google "skip scan DB2" but came up with nothing.
Does DB2 have the feature under a different name?
Nov 12 '05
59 6762

the lack of skip-scan be overcome,


Finally we concur. Skip scan is pretty low on the list of "cool things
we have that they don't" I doubt that you will ever see a bill board
that says 'Oracle has Skip Scan (TM)'

It is a reasonably useful thing if you are short on disk space in a
large DW and don't want to create additional indexes, or have a once in
a lifetime ad-hoc query that you don't want to index for (and the stars
align perfectly), but in the end disk is cheap(ish) and we tend to
recommend bit mapped indexes in these environments over b-trees anyhow.
YMMV.
Nov 12 '05 #51

"Mark A" <ma@switchboard.net> wrote in message
news:JZ******************@news.uswest.net...
DB2 and Oracle have lots of nice features to make up for people who don't
know how to design databases. Each time one of these features is added, the code gets bigger, the path length gets longer, the memory required gets
bigger, and everything gets a little bit slower (except for the one thing
the feature is trying to address).

Meanwhile people report that MySQL is several times faster than DB2 or
Oracle for simple queries.


The first ThinkPad I used is ThinkPad 770 - one cpu (forget the Hz), 64M
mem, 8G hard disk ...
DB2 UDB V5.x is faster than Version 7.x. I would always like to use
Version7.x, not Version5.x. This has nothing with the support service
withdraw.

I also remembered the first year when I worked for Informix. Some customer
complained that Online7.x run slower than Online version5.x. But I didn't
heare the complain when they run IDS V7.x on SMP box for the application.

Nov 12 '05 #52

"Mark A" <ma@switchboard.net> wrote in message
news:NS****************@news.uswest.net...

For the above application, assuming that one can't create a second index
(which seem like a ridiculous and artificial restriction to me), I would
simply reverse the order of the index columns (REGISTRATION#, STATE).

I can't imagine that a predicate with STATE = 'XX' would be used much as a
single predicate. But even if so, and one wanted to cluster by STATE
(possibly a good idea) then the second index could be created on the STATE
column (by itself). This would be a fairly small index since it only has 2
bytes.

With a little bit of intelligent design, the lack of skip-scan be overcome, in fact the proposed solution has far better performance than skip-scan.

=========
That is not so simple.
I've seen a lot of dw applications built composite index based on a couple
of columns (c1, c2, c3, c4, c5, ...). Most queries will involve the leading
column of this index, that is one reason why it is chosen as the leading
column. There is still some kind of query stmts only searching based on the
(c3, c4, c5). Because it is already an index scan (even index-only), in most
cases, the DBAs will not build another additional index on (c3, c4, c5), in
this case, skip-scan will be very useful, especially for the big index tree.

Another case is not about skip scan, it is about DESIGN -
For the fact table, let's say we have three dimensions, we build a composite
index on these three dimensions (d1, d2, d3). We also build three indexes on
each dimension.
It is quite often to see the query based on -
(d1, d2, d3), (d1, d2), (d2, d3), (d1, d3), (d1), (d2), (d3), ...
For the query - SELECT COUNT(*) FROM mytbl where d2=v2 and d3=v3. It might
be the cost of index only (composite index) is lower than that of index
anding. I will not argue with - dynamic bitmap or static bitmap. My point is
this is not because the customer defined the wrong order of the index
columns.
(BTW, for this case, MDC is very helpful.)

Nov 12 '05 #53
"Fan Ruo Xin" <fa*****@sbcglobal.net> wrote in message
news:zX******************@newssvr26.news.prodigy.c om...
That is not so simple.
I've seen a lot of dw applications built composite index based on a couple
of columns (c1, c2, c3, c4, c5, ...). Most queries will involve the leading column of this index, that is one reason why it is chosen as the leading
column. There is still some kind of query stmts only searching based on the (c3, c4, c5). Because it is already an index scan (even index-only), in most cases, the DBAs will not build another additional index on (c3, c4, c5), in this case, skip-scan will be very useful, especially for the big index tree.
Why not? Are the DBA's incompetent?

Besides, I don't think skip scan works past the first column (based on what
Mark Townsend said); however, I don't believe there has been any adequate
description of how it works.
Another case is not about skip scan, it is about DESIGN -
For the fact table, let's say we have three dimensions, we build a composite index on these three dimensions (d1, d2, d3). We also build three indexes on each dimension.
It is quite often to see the query based on -
(d1, d2, d3), (d1, d2), (d2, d3), (d1, d3), (d1), (d2), (d3), ...
For the query - SELECT COUNT(*) FROM mytbl where d2=v2 and d3=v3. It might
be the cost of index only (composite index) is lower than that of index
anding. I will not argue with - dynamic bitmap or static bitmap. My point is this is not because the customer defined the wrong order of the index
columns.
(BTW, for this case, MDC is very helpful.)

There is no free lunch. Skip-scan is not free, there is a cost (in
performance) that someone has to pay (usually everyone has to pay whether
they use skip-scan or not). Your entire analysis is based on the faulty
assumption (IMO) that it is free.
Nov 12 '05 #54
AK
> For the fact table, let's say we have three dimensions, we build a composite
index on these three dimensions (d1, d2, d3). We also build three indexes on
each dimension.
It is quite often to see the query based on -
(d1, d2, d3), (d1, d2), (d2, d3), (d1, d3), (d1), (d2), (d3), ...
For the query - SELECT COUNT(*) FROM mytbl where d2=v2 and d3=v3. It might
be the cost of index only (composite index) is lower than that of index
anding. I will not argue with - dynamic bitmap or static bitmap. My point is
this is not because the customer defined the wrong order of the index
columns.
(BTW, for this case, MDC is very helpful.)


this is a very good observation.
In similar situations like that, in Oracle we use bitmap indexes, in DB2 we use MDC
Nov 12 '05 #55
AK
> > (c3, c4, c5). Because it is already an index scan (even index-only), in
most
cases, the DBAs will not build another additional index on (c3, c4, c5),

in
this case, skip-scan will be very useful, especially for the big index

tree.


Why not? Are the DBA's incompetent?


I believe exprerienced DBAs try to both speed up queries using indexes
and keep number of indexes as low as possible. It's a compromise.
Usually we don't need the very best performance for one query no
matter what.
Usually we need acceptable performance for the whole system as well as
acceptable, not the very best, response time for every query.
Nov 12 '05 #56
"AK" <ak************@yahoo.com> wrote in message
news:46**************************@posting.google.c om...
(c3, c4, c5). Because it is already an index scan (even index-only),
in most
cases, the DBAs will not build another additional index on (c3, c4,
c5), in
this case, skip-scan will be very useful, especially for the big index

tree.


Why not? Are the DBA's incompetent?


I believe exprerienced DBAs try to both speed up queries using indexes
and keep number of indexes as low as possible. It's a compromise.
Usually we don't need the very best performance for one query no
matter what.
Usually we need acceptable performance for the whole system as well as
acceptable, not the very best, response time for every query.


The point is:

1. That the proper indexes will perform much better than skip-scan.
2. The cost of additional indexes is not free, but neither is the cost of
skip-scan free either. In order for it to work, there must be some extra
index maintenance going on, whether skip-scan is used or not.
Nov 12 '05 #57
> In order for it to work, there must be some extra
index maintenance going on, whether skip-scan is used or not.

Skip scan does not require any extra maintenance other than what is
already being done for b-trees anyhow (at least in Oracle - I'm still
looking for an indepth discussion of IBM indexes to see if they are
fundamentally different - there may be one little thing that Oracle does
with it's B-trees that IBM doesn't, which I'm still looking to confirm -
any pointers ?).

It uses range scans, and an algorithm that is largely independent of the
actual data values to determine where and when to start scanning, and
when to stop. The algorithm is not exactly rocket science, but is a
little bit clever, and may have a patent or two associated with it (I'm
not saying it has), so to be safe I'm not going to disclose it here.

However, Serge indicates that they (IBM) are aware of at least the
general principles, in terms of the analogy he used (searching a phone
book). And if you think about how you would physically search a phone
book, and you will get the basic idea. You do not need to know where
every new value of the alphabet starts in a phone book to use it effectively

Note that it also works for predicates on any level in the compound key
(i.e )a,b,c), search on c), but will only kick in if the stats show that
the relative cardinality of c is very high compared with the relative
cardinality of a and then b (i.e the skip is likely to be large and
therefore worthwhile).

For Fan's dimensional example, bit mapped join indexes would be a much,
much better solution in Oracle than a single b-tree index relying on
skip scan.
Nov 12 '05 #58
"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...
In order for it to work, there must be some extra
index maintenance going on, whether skip-scan is used or not.
Skip scan does not require any extra maintenance other than what is
already being done for b-trees anyhow (at least in Oracle - I'm still
looking for an indepth discussion of IBM indexes to see if they are
fundamentally different - there may be one little thing that Oracle does
with it's B-trees that IBM doesn't, which I'm still looking to confirm -
any pointers ?).

It uses range scans, and an algorithm that is largely independent of the
actual data values to determine where and when to start scanning, and
when to stop. The algorithm is not exactly rocket science, but is a
little bit clever, and may have a patent or two associated with it (I'm
not saying it has), so to be safe I'm not going to disclose it here.

However, Serge indicates that they (IBM) are aware of at least the
general principles, in terms of the analogy he used (searching a phone
book). And if you think about how you would physically search a phone
book, and you will get the basic idea. You do not need to know where
every new value of the alphabet starts in a phone book to use it

effectively
Note that it also works for predicates on any level in the compound key
(i.e )a,b,c), search on c), but will only kick in if the stats show that
the relative cardinality of c is very high compared with the relative
cardinality of a and then b (i.e the skip is likely to be large and
therefore worthwhile).

For Fan's dimensional example, bit mapped join indexes would be a much,
much better solution in Oracle than a single b-tree index relying on
skip scan.

Since IBM Redbrick and IBM Informix have skip-scans, I am sure that IBM
knows how it works. If it has a patent, or patent pending, the algorithm has
been made public with the patent office.

If skip scan is merely looking at the existing b-tree non-leaf pages (the
table of contents of the index leaf pages) then it is of little use for the
example given (although it could be of some use in other examples). That is
because there are over 50 states that would have to skip-scanned (starting
and stopping scans) and most non-leaf structures are not large enough to
make it beneficial that many times.

If there is no extra maintenance upfront on the indexes, then there is extra
work figuring out where to skip and scan.

As I have stated, the proper set-up of index structures BLOWS SKIP-SCAN AWAY
in terms of performance.
Nov 12 '05 #59

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...
For Fan's dimensional example, bit mapped join indexes would be a much,
much better solution in Oracle than a single b-tree index relying on
skip scan.

In ORACLE world, YES, for this case bitmap-join indexes would be better than
a single index only + skip scan.
By using MDC in DB2 UDB, for the block index tree, skip scan will not help
either in most cases (block index is far small than the RID B+ tree index).
I used this example is I want to say this is not the DESIGN problem.

For comparing ORACLE's static bitmap join technology and IBM DB2 UDB's MDC
technology:
I don't think they deal with exactly the same issue in some cases, but at
least they all try to save the space (compare traditional b+tree index).

Nov 12 '05 #60

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.