473,836 Members | 1,554 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Skip scans in DB2

AK
I tried to google "skip scan DB2" but came up with nothing.
Does DB2 have the feature under a different name?
Nov 12 '05
59 6884
Mark A wrote:
"Mark Townsend" <ma***********@ comcast.net> wrote in message
news:40******** ******@comcast. net...
Mark A wrote:
That is, regular DB2 indexes are not sub-sectioned the
way Mark Townsend described Oracle indexes.


Oracle uses a b-tree structure for it's regular indexes (as opposed to
it's bit mapped indexes). I believe this is the same technique used by
DB2 but actually have no idea.


DB2 also uses a b-tree for its indexes (since before Oracle was born). But
the index leaf pages are not stored in sub-sections the way the described it
in one of your previous posts.


I think I see the confusion. I said
"the index is divided into a number of sub-indexes, based on the number
of distinct values in the leading column (which are gotten from
statistics), and then each sub index is scanned looking for a certain
something"

I did not mean physically divided (and stored). The division is logical
and performed at run time by the optimizer (same way the optimizer
decides ranges for parallel queries). Instead of sub-indexes, perhaps I
should have said "sub index ranges".

Apologies.

Nov 12 '05 #41
Mark A wrote:

Yes, DB2 has used b-tree indexes since before Oracle became a product.


Well I guess the old adage about old dogs and new tricks is true then.
Sorry, couldn't resist :-)

Nov 12 '05 #42
"Mark Townsend" <ma***********@ comcast.net> wrote in message
news:40******** ******@comcast. net...

I think I see the confusion. I said
"the index is divided into a number of sub-indexes, based on the number
of distinct values in the leading column (which are gotten from
statistics), and then each sub index is scanned looking for a certain
something"

I did not mean physically divided (and stored). The division is logical
and performed at run time by the optimizer (same way the optimizer
decides ranges for parallel queries). Instead of sub-indexes, perhaps I
should have said "sub index ranges".

Apologies.

I still don't understand this. If the physical position (where there are
changes in the first column of the index) is not stored in the index or the
catalog, and maintained real-time, then the database would have to read the
all the values of the index at runtime. For DB2, once the entire index is
read, the rows can be filtered without any further work performed by DB2
(assuming that the predicate contains one of the columns in the index), so a
sub-scan would be redundant.

So that explanation does not make sense to me, and I suspect that there is
something more to it than this.

DB2 and Oracle have lots of nice features to make up for people who don't
know how to design databases. Each time one of these features is added, the
code gets bigger, the path length gets longer, the memory required gets
bigger, and everything gets a little bit slower (except for the one thing
the feature is trying to address).

Meanwhile people report that MySQL is several times faster than DB2 or
Oracle for simple queries.
Nov 12 '05 #43
THe only skip scan i know of in db2 is the
Early Out on fetch first and maybe sometimes for min() max() processing.

For MDC, you have block indexes per dimension (state or whatever) and the
composite (combined block index keys).
It's like using X and Y coordinates (if 2 dimensions are involved) to get
the list of blocks that contain the required data pages.
Big block read (extent) and block-based bufferpools should be able to kick
in.

Terminology :

A dimension block index will be automatically created for each of the
dimensions specified, and it will be used by the optimizer to quickly and
efficiently access data along each dimension.

The set of blocks that contain pages with data having a certain key value of
one of the dimension block indexes is called a slice.

A composite block index will also automatically be created, containing all
dimension key columns, and will be used to maintain the clustering of data
over insert and update activity.
The composite block index can also be used by the optimizer to efficiently
access data having particular dimension values.

Every unique combination of dimension values form a logical cell

PM
Nov 12 '05 #44

"Mark Townsend" <ma***********@ comcast.net> wrote in message
news:40******** ******@comcast. net...

I am still waiting to hear how Oracle gets all the state values without
reading the entire index.


I can't explain why Oracle doesn't need the state values without
actually explaining how this is done, and that's something I don't want
to do, for a number of reasons, least not being it's actually a pretty
clever algorithm. But basically, the index is divided into a number of
sub-indexes, based on the number of distinct values in the leading
column (which are gotten from statistics), and then each sub index is
scanned looking for a certain something that tells the DB it's not worth
scanning that sub-index anymore, at which stage the remainder of that
sub-index is skipped. And as long as it's costed properly, and the stars
align, then it works very well.


Just curious...
If I use AK's example again, but changed the following sql
SELECT COUNT(*) FROM POST_OFFICE WHERE CITY='Springfie ld'
to
SELECT COUNT(*) FROM POST_OFFICE WHERE STATE > 'AA' and CITY='Springfie ld'

Does ORACLE optimizer still choose to use index SKIP SCAN?

Nov 12 '05 #45

"Mark Townsend" <ma***********@ comcast.net> wrote in message
news:40******** ******@comcast. net...
Of course this is completely transparent in Oracle - the example AK gave
you was to explain what is happening at the logical level. It doesn't
imply that the SQL needs to be written that way, or that the database is
even re-writing the SQL. Here's a published Oracle example, available on
otn.oracle.com. Assume the following table with 5,024,000 rows.

REGISTERED_OWNE RS
(
STATE VARCHAR2(2) NOT NULL,
REGISTRATION# VARCHAR2(10) NOT NULL,
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
MAKE VARCHAR2(30),
MODEL VARCHAR2(15),
YEAR_MFG NUMBER,
CITY VARCHAR2(30),
ZIP NUMBER
)

with a single composite index on (STATE, REGISTRATION#)
Assume this query

SELECT first_name, last_name, zip
FROM REGISTERED_OWNE RS WHERE registration# = '4FBB428'

In Oracle8i (which doesnt have skip scan), the resultant FTS would take
4035 hundreths.

In Oracle9i, the optimizer will cost (and in this case use) a sub-tree
lookup depending on the low and high values for the REGISTRATION# column
at a given STATE node. Comparable time - 25 hundreths of a second.

If this query is only executed spasmodically, then creating and
maintaining a second index on REGISTRATION# may be less than optimal.


At beginning, this example did impress me. But when I think again - the
index tree of this table (5m records) is a small one. Then I would rather
think DB2 and INFORMIX's readahead technology did do better than ORACLE's.
Does that make sense?
Nov 12 '05 #46
"Fan Ruo Xin" <fa*****@sbcglo bal.net> wrote in message
news:Ys******** **********@news svr26.news.prod igy.com...

"Mark Townsend" <ma***********@ comcast.net> wrote in message
news:40******** ******@comcast. net...
Of course this is completely transparent in Oracle - the example AK gave
you was to explain what is happening at the logical level. It doesn't
imply that the SQL needs to be written that way, or that the database is
even re-writing the SQL. Here's a published Oracle example, available on
otn.oracle.com. Assume the following table with 5,024,000 rows.

REGISTERED_OWNE RS
(
STATE VARCHAR2(2) NOT NULL,
REGISTRATION# VARCHAR2(10) NOT NULL,
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
MAKE VARCHAR2(30),
MODEL VARCHAR2(15),
YEAR_MFG NUMBER,
CITY VARCHAR2(30),
ZIP NUMBER
)

with a single composite index on (STATE, REGISTRATION#)
Assume this query

SELECT first_name, last_name, zip
FROM REGISTERED_OWNE RS WHERE registration# = '4FBB428'

In Oracle8i (which doesnt have skip scan), the resultant FTS would take
4035 hundreths.

In Oracle9i, the optimizer will cost (and in this case use) a sub-tree
lookup depending on the low and high values for the REGISTRATION# column
at a given STATE node. Comparable time - 25 hundreths of a second.

If this query is only executed spasmodically, then creating and
maintaining a second index on REGISTRATION# may be less than optimal.


At beginning, this example did impress me. But when I think again - the
index tree of this table (5m records) is a small one. Then I would rather
think DB2 and INFORMIX's readahead technology did do better than ORACLE's.
Does that make sense?

For the above application, assuming that one can't create a second index
(which seem like a ridiculous and artificial restriction to me), I would
simply reverse the order of the index columns (REGISTRATION#, STATE).

I can't imagine that a predicate with STATE = 'XX' would be used much as a
single predicate. But even if so, and one wanted to cluster by STATE
(possibly a good idea) then the second index could be created on the STATE
column (by itself). This would be a fairly small index since it only has 2
bytes.

With a little bit of intelligent design, the lack of skip-scan be overcome,
in fact the proposed solution has far better performance than skip-scan.
Nov 12 '05 #47
> Then I would rather
think DB2 and INFORMIX's readahead technology did do better than ORACLE's.
Does that make sense?


Its horses for courses. This example is comparing a FTS vs a Skip Scan
index retrieval, but Oracle also does fast full index scans, so it's not
an either/or situation. Sometimes skip scan wins, sometime fast full
index scan wins. Skip scan tends to win when the cardinality of the
first column is low and the second is high (potentially the situation
where Mark A would recommend reversing the keys in the compound index).


Nov 12 '05 #48

"Mark A" <ma@switchboard .net> wrote in message
news:b_******** **********@news .uswest.net...
"Fan Ruo Xin" <fa*****@sbcglo bal.net> wrote in message
news:fD******** *********@newss vr26.news.prodi gy.com...
Skip-scan doesn't exist in DB2. I don't mean to argue with you. Just wondering why you are so sure about
this?

1. There is no access path description of skip-scan in the DB2 explain or

in any DB2 reference manual or IBM marketing materials. That is true.
But I did know some feature which Version 8 introduced. But either I enable
it or not, I can't find any changing from explain output.
Another point is - some thing such as db2 prefetch technology changed from
one version to another version. Can you always find the corresponding
documentation in the db2 reference manual/article?
2. DB2 has no way to do this without reading the whole index first (which
would be redundent). That is, regular DB2 indexes are not sub-sectioned the way Mark Townsend described Oracle indexes. I don't think so. I would rather believe the sub-section is a VIRTUAL sub
index tree, not a PHYSICAL sub index tree.
3. None of the IBM Toronto Lab people on htis forum have said that sub-scan exists in DB2 (although IBM Redbrick aparantly has it).
It seems to me
that having skip scan capability would impose a slight penalty on index maintenance, even if it were never used. I can't understand why this will increase index maintenance. This will
increase the CPU workload during searching the index tree. But the basic
idea is saving I/O.

In order to skip part of the index during a scan (skip-scan), and go to
where there is a new value for the first column in the index, DB2 would

have to know where the new value starts, and that position would have to be
maintained real-time, which incurs some resources, even if skip-scan is
never used. I thought it doesn't need to be so complex. The most important thing is - at
which point, stop scan, and just jump to the next key of the leading column.

Regular (non MDC) DB2 indexes do not have sub-sections based on the value of the first column of the index, which apparently Oracle has in some or all of its indexes. Maintaining the index in sub-sections has "some" performance
penalty when maintained in real-time. There may also a penalty if skip-scan cannot be used and a full sequential index scan is needed, since indexes
that have sub-sections may not utilize sequential prefetch as efficiently. You can think of a block index as a regular B+ tree index.
Because I think the sub-section is a virtual part. There is no changing
during I/U/D.

The best way (IMO) to avoid full sequential index scans (when an index is
used but the leading column of an index is not in the predicate) is to add
the proper indexes required, or reverse the order of the columns on existing indexes. This would perform much better than either a full sequential index scan or a skip-scan. But if the index is not particularly large and/or the
query that causes the full sequential index scan is not frequent or is not
critical, then I might not even worry about it. If db2 does support skip scan. then the optimizer can take care of the whole
thing. The index tree size is one of the important thing which db2 optimizer
will pay attention to.
As I said. MDC indexes "may" be different, as you explained above, but I am not really sure how MDC indexes work.

Nov 12 '05 #49
SELECT COUNT(*) FROM POST_OFFICE WHERE STATE > 'AA' and CITY='Springfie ld'

Does ORACLE optimizer still choose to use index SKIP SCAN?


Of the top of my head,my initial reaction is probably not (although, the
more I look at it, possibly). Why don't you download and try ?

http://otn.oracle.com/software/produ...10g/index.html

Nov 12 '05 #50

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.