Skip scans in DB2 - DB2 Database

I tried to google "skip scan DB2" but came up with nothing.
Does DB2 have the feature under a different name?

Nov 12 '05 #1

Subscribe Post Reply

6786

Mark A

"AK" <ak************@yahoo.com> wrote in message
news:46**************************@posting.google.c om...

I tried to google "skip scan DB2" but came up with nothing.
Does DB2 have the feature under a different name?

Can you explain what the feature is?

Nov 12 '05 #2

Mark Townsend

AK wrote:

I tried to google "skip scan DB2" but came up with nothing.
Does DB2 have the feature under a different name?

In Oracle, it means that an index can be used even if the access is not
by the first column indexed

I.e Table with a,b,c,d, with an index on a,b,c will use said index when
the where clause only mentions b. Saves having to build indexes on a,b,c
and b,c and c (and potentially also b and a,c).

Nov 12 '05 #3

Ian

Mark Townsend wrote:

AK wrote:
I tried to google "skip scan DB2" but came up with nothing. Does DB2
have the feature under a different name?

In Oracle, it means that an index can be used even if the access is not
by the first column indexed

I.e Table with a,b,c,d, with an index on a,b,c will use said index when
the where clause only mentions b. Saves having to build indexes on a,b,c
and b,c and c (and potentially also b and a,c).

As far as I know, DB2 UDB for LUW can not do this.

However, if you were to create 3 separate indexes on (a), (b) and (c), DB2
can certainly apply index ANDing in cases where you have predicates on
(a) or (b) or (c) or ((a) and (c)) or ((b) and (c)).

The real question is whether you have an appropriate index design for
your application. It's probably not a good idea to create indexes on (a),
(b), (c), (a,c), and (b,c)!

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----

Nov 12 '05 #4

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:3qYZb.98325$uV3.572505@attbi_s51...

AK wrote:
I tried to google "skip scan DB2" but came up with nothing.
Does DB2 have the feature under a different name?

In Oracle, it means that an index can be used even if the access is not
by the first column indexed

I.e Table with a,b,c,d, with an index on a,b,c will use said index when
the where clause only mentions b. Saves having to build indexes on a,b,c
and b,c and c (and potentially also b and a,c).

DB2 can use an index even if the first (or leading columns) of the index are
not in the predicate. However, the entire index will be read from top to
bottom, instead of using the b-tree. If the index is large enough, then it
might trigger prefetching of the index pages to make it faster.

Nov 12 '05 #5

Jeremy Rickard

Mark Townsend <ma***********@comcast.net> wrote in message news:<3qYZb.98325$uV3.572505@attbi_s51>...

AK wrote:
I tried to google "skip scan DB2" but came up with nothing.
Does DB2 have the feature under a different name?

In Oracle, it means that an index can be used even if the access is not
by the first column indexed

I.e Table with a,b,c,d, with an index on a,b,c will use said index when
the where clause only mentions b. Saves having to build indexes on a,b,c
and b,c and c (and potentially also b and a,c).

Mark,

Sounds like you must mean a "non-matching (full) index scan (with or)
without data access"? Not so marketing-friendly perhaps.

DB2 UDB (for LUW) has always supported this. I believe mainframe DB2
has always too. It's an option considered by the optimiser, not
hard-coded.

Using index-only scans when only index non-leading columns are
selected is fairly standard stuff once the optimiser takes into
account not just cardinalities but also I/O and CPU cost. So standard
in fact that people might imagine it's hard-coded, but it's not.
Whack up the transferrate of your index tablespace to a high enough
value, and you'll see DB2 then switches over to doing the table scan
instead.

The same sort of thing applies when you need to do data access to
retrieve unindexed columns, and have predicates referencing only index
non-leading columns. For want of a better index, the optimiser will
quite often choose a full index scan followed by data page fetching on
the RIDs, since when retrieving only a small proportion of rows it can
be cheaper than simply scanning a wider table.

Hope that makes sense.
Jeremy Rickard

Nov 12 '05 #6

Mark A

"Jeremy Rickard" <jr******@unisystems.biz> wrote in message
news:d3**************************@posting.google.c om...

Sounds like you must mean a "non-matching (full) index scan (with or)
without data access"? Not so marketing-friendly perhaps.

DB2 UDB (for LUW) has always supported this. I believe mainframe DB2
has always too. It's an option considered by the optimiser, not
hard-coded.

Using index-only scans when only index non-leading columns are
selected is fairly standard stuff once the optimiser takes into
account not just cardinalities but also I/O and CPU cost. So standard
in fact that people might imagine it's hard-coded, but it's not.
Whack up the transferrate of your index tablespace to a high enough
value, and you'll see DB2 then switches over to doing the table scan
instead.

The same sort of thing applies when you need to do data access to
retrieve unindexed columns, and have predicates referencing only index
non-leading columns. For want of a better index, the optimiser will
quite often choose a full index scan followed by data page fetching on
the RIDs, since when retrieving only a small proportion of rows it can
be cheaper than simply scanning a wider table.

Hope that makes sense.
Jeremy Rickard

Yes, DB2 for OS/390 has always supported non-matching index scans. It is
identified in the Explain with the Matching Columns value. Matching columns
of 0 means a non-matching index scan, and a value of 1 means 1 columns is
matching, etc for however many left most columns of a multi-column index are
matching (supplied in the predicate).

Why can't we get the number of matching columns in the DB2 for LUW Explain?

Nov 12 '05 #7

I apologize for not explaining what "skip scans" are.
Let's say there is an index
CREATE INDEX PO_STATE_CITY ON POST_OFFICE(STATE, SITY);
then a query
SELECT COUNT(*) FROM POST_OFFICE WHERE CITY='Springfield'
will scan the whole index, which is much better than scanning the
table.
However, in this particular situation another approach is much faster:
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AK' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AL' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='Ar' AND
CITY='Springfield') +

and so on.

The idea is simple: there is no need to start at the beginning of the
index, it's better to start at('AK','Springfield'). Once you are done
with this pair, it's faster to jump over to ('AL','Springfield') than
to continue scanning.

It this situation it is much faster to perform 50 index look-ups than
to scan the whole index.

I hope my explanations are clear this time.

Nov 12 '05 #8

Mark A

"AK" <ak************@yahoo.com> wrote in message
news:46**************************@posting.google.c om...

I apologize for not explaining what "skip scans" are.
Let's say there is an index
CREATE INDEX PO_STATE_CITY ON POST_OFFICE(STATE, SITY);
then a query
SELECT COUNT(*) FROM POST_OFFICE WHERE CITY='Springfield'
will scan the whole index, which is much better than scanning the
table.
However, in this particular situation another approach is much faster:
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AK' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AL' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='Ar' AND
CITY='Springfield') +

and so on.

The idea is simple: there is no need to start at the beginning of the
index, it's better to start at('AK','Springfield'). Once you are done
with this pair, it's faster to jump over to ('AL','Springfield') than
to continue scanning.

It this situation it is much faster to perform 50 index look-ups than
to scan the whole index.

I hope my explanations are clear this time.

No, I don't think DB2 does that, and I also don't think that it is faster
for the database to automatically do it that way unless you have a unusually
large table/index. I am assuming that you expect the database to do the
conversion from the original SQL to the one with 50 predicates (or 50
separate SQL statements).

Don't forget that a full index scan can use prefetch to improve performance
(which is not used when doing b-tree access). Prefetch is when DB2 loads the
necessary pages into the buffer pool before the are requested because DB2
knows they will be needed. Prefetch is also fast because the pages are
usually continuous on disk.

If you did have the unusual situation where it would be faster to do a skip
scan, the best solution is to create another index on city only. This would
be 50 times faster (in theory) than the skip scan solution.

Nov 12 '05 #9

Mark Townsend

If you did have the unusual situation where it would be faster to do a skip
scan, the best solution is to create another index on city only. This would
be 50 times faster (in theory) than the skip scan solution.

Thanks Mark - obviously, everybody at Oracle, and the OP, are complete
idiots, and we will withdraw this utterly useless capability
immediately, because everybody knows that if IBM doesn't have it, it
isn't worth having.

Nov 12 '05 #10

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...

If you did have the unusual situation where it would be faster to do a skip scan, the best solution is to create another index on city only. This would be 50 times faster (in theory) than the skip scan solution.

Thanks Mark - obviously, everybody at Oracle, and the OP, are complete
idiots, and we will withdraw this utterly useless capability
immediately, because everybody knows that if IBM doesn't have it, it
isn't worth having.

I don't know exactly how Oracle indexes work, but are you saying that it
would be faster with the skip scan than creating an index on city? If so,
then Oracle indexes are much different than DB2. A skip scan would not be
faster (or even as fast) than a city index in DB2 assuming that the database
would have to construct the revised SQL with 50 predicates (in most cases).

It skip scan works faster in Oracle, then great. I never said anything about
Oracle. Overall, both DB2 and Oracle have comparable performance and based
on the TPC benchmarks I have seen, DB2 does not have to apologize about its
performance. Since DB2 is a relational database (which Oracle seems to be
getting away from), the exact way in which the optimizer determines the
access path (use of indexes, etc) does not need to be embedded within the
application SQL statement.

Nov 12 '05 #11

Mark Townsend

Mark A wrote:

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...
If you did have the unusual situation where it would be faster to do a
skip
scan, the best solution is to create another index on city only. This
would
be 50 times faster (in theory) than the skip scan solution.
Thanks Mark - obviously, everybody at Oracle, and the OP, are complete
idiots, and we will withdraw this utterly useless capability
immediately, because everybody knows that if IBM doesn't have it, it
isn't worth having.

I don't know exactly how Oracle indexes work, but are you saying that it
would be faster with the skip scan than creating an index on city?

No, it was clearly stated that skip scan saves the cost of having to
build additional indexes, and as such is an optimization. Of course a
seperate index on City would be faster. But in a DW, it could also be
expensive, especially if the queries that use them are irregular. Hence
the idea behind skip scan.
If so,
then Oracle indexes are much different than DB2. A skip scan would not be
faster (or even as fast) than a city index in DB2 assuming that the database
would have to construct the revised SQL with 50 predicates (in most cases).
Nobody needs to re-construct any revised SQL. The Oracle optimizer
simply says - hmm - index on State, City, we can skip scan (probe by
state, then scan for city), or do a FTS. Sometimes Skip Scan wins. If
there is a seperate index on City, the the optimizer will, of course,
(actually, may) use that instead.

It skip scan works faster in Oracle, then great. I never said anything about
Oracle. Overall, both DB2 and Oracle have comparable performance and based
on the TPC benchmarks I have seen, DB2 does not have to apologize about its
performance.
TPC-H will not demonstrate this feature. Nor do the TPC-H figures take
in account time and space required to build multiple indexes after data
load.

Since DB2 is a relational database (which Oracle seems to be getting away from),
Seems to who ?
the exact way in which the optimizer determines the
access path (use of indexes, etc) does not need to be embedded within the
application SQL statement.

You mean DB2 has an optimizer ? Lordy, lordy, what will they think of
next. You kids with your modern ideas.

Of course this is completely transparent in Oracle - the example AK gave
you was to explain what is happening at the logical level. It doesn't
imply that the SQL needs to be written that way, or that the database is
even re-writing the SQL. Here's a published Oracle example, available on
otn.oracle.com. Assume the following table with 5,024,000 rows.

REGISTERED_OWNERS
(
STATE VARCHAR2(2) NOT NULL,
REGISTRATION# VARCHAR2(10) NOT NULL,
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
MAKE VARCHAR2(30),
MODEL VARCHAR2(15),
YEAR_MFG NUMBER,
CITY VARCHAR2(30),
ZIP NUMBER
)

with a single composite index on (STATE, REGISTRATION#)
Assume this query

SELECT first_name, last_name, zip
FROM REGISTERED_OWNERS WHERE registration# = '4FBB428'

In Oracle8i (which doesnt have skip scan), the resultant FTS would take
4035 hundreths.

In Oracle9i, the optimizer will cost (and in this case use) a sub-tree
lookup depending on the low and high values for the REGISTRATION# column
at a given STATE node. Comparable time - 25 hundreths of a second.

If this query is only executed spasmodically, then creating and
maintaining a second index on REGISTRATION# may be less than optimal.

Try it yourself.

Nov 12 '05 #12

Larry Edelstein

Mark! You've finally come around!

Larry Edelstein

Mark Townsend wrote:

If you did have the unusual situation where it would be faster to do a
skip
scan, the best solution is to create another index on city only. This
would
be 50 times faster (in theory) than the skip scan solution.

Thanks Mark - obviously, everybody at Oracle, and the OP, are complete
idiots, and we will withdraw this utterly useless capability
immediately, because everybody knows that if IBM doesn't have it, it
isn't worth having.

Nov 12 '05 #13

Mark Townsend

Larry Edelstein wrote:

Mark! You've finally come around!

SArcAsm. The sAnctimonious Attitude Annoyed me. I guess I now understAnd
what the A stAnds for.

Nov 12 '05 #14

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...

No, it was clearly stated that skip scan saves the cost of having to
build additional indexes, and as such is an optimization. Of course a
seperate index on City would be faster. But in a DW, it could also be
expensive, especially if the queries that use them are irregular. Hence
the idea behind skip scan.

I am very happy that skip scan works to improve performance on Oracle. DB2
does not work the same as Oracle, and I do not think that the feature would
be faster in DB2. This is because DB2 does very fast index scans and it
would have to read the entire index anyway to do the "probe" of the states.
If Oracle does not have to read the entire index to "probe" the states, then
it does extra maintenance of the index values real time (which DB2 does not
do). That extra real-time maintenance (if it is done by Oracle--I don't
know) would be a performance hit.

Also DB2 has very good tools to identify those cases (non-matching index
scans) where a index on city should be created for optimal performance.

Your statement about skip scan not be tested in TPC benchmarks is
misleading. The reason skip scan is not tested in the TPC benchmarks is only
because the people who run the Oracle benchmarks set up the indexes
correctly to begin with, so that there are no non-matching index scans.
There is nothing in the benchmark rules about indexes, they are set up
however the benchmark team desires to provide optimal performance. If the
Oracle team wanted to show off its vaunted skip scan, they could create some
non-optimal indexes definitions. Maybe you should suggest that to them.

Nov 12 '05 #15

Mark Townsend

> This is because DB2 does very fast index scans and it

would have to read the entire index anyway to do the "probe" of the states.

OK - so now I'm learning something. Are you saying that DB2 will read
the entire index to find the rows for just one (or more) states (what
Oracle would call an Fast Full Index Scan ?). I'm pretty sure this is
NOT what you mean.
If Oracle does not have to read the entire index to "probe" the states, then
it does extra maintenance of the index values real time (which DB2 does not
do). That extra real-time maintenance (if it is done by Oracle--I don't
know) would be a performance hit.
Hmm - this IS interesting. Oracle does maintain indexes in real-time,
albeit efficiently. Does IBM not ? I'll have to read up more on DB2
indexing.
Also DB2 has very good tools to identify those cases (non-matching index
scans) where a index on city should be created for optimal performance.
Of course, as does everybody, and I also assume the corollorary as well,
identification of indexes that are not providing any particular useful
benefit (which in a DW can actually be a bigger problem).

Your statement about skip scan not be tested in TPC benchmarks is
misleading. The reason skip scan is not tested in the TPC benchmarks is only
because the people who run the Oracle benchmarks set up the indexes
correctly to begin with, so that there are no non-matching index scans.
Right, because as I said, cost of storage is not a directly measured
factor in TPC-H (and I DID say directly, I'm aware it surfaces in the
overal cost and cost per query calculation). However, for real-world
VLDB customers, cost of storage often is, which is why skip-scan (and
bit mapped indexes, and data compression) are all very useful techniques.
There is nothing in the benchmark rules about indexes, they are set up
however the benchmark team desires to provide optimal performance. If the
Oracle team wanted to show off its vaunted skip scan, they could create some
non-optimal indexes definitions. Maybe you should suggest that to them.

See comment above.

Nov 12 '05 #16

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...

This is because DB2 does very fast index scans and it
would have to read the entire index anyway to do the "probe" of the states.

OK - so now I'm learning something. Are you saying that DB2 will read
the entire index to find the rows for just one (or more) states (what
Oracle would call an Fast Full Index Scan ?). I'm pretty sure this is
NOT what you mean.

No. If you look at the revised query that AK posted with the 50 states
hardcoded (presumably the optimizer figures this out) DB2 would have to read
the entire index to come up all the possible unique values for the first
column of the index.. How does Oracle do it? If they maintain the list of
unique values of the first column in real-time, that is extra work that DB2
does not do. But I am not sure Oracle indexes are directly comparable to DB2
indexes anyway.
If Oracle does not have to read the entire index to "probe" the states, then it does extra maintenance of the index values real time (which DB2 does not do). That extra real-time maintenance (if it is done by Oracle--I don't
know) would be a performance hit.

Hmm - this IS interesting. Oracle does maintain indexes in real-time,
albeit efficiently. Does IBM not ? I'll have to read up more on DB2
indexing.

Very funny.Of course DB2 maintains indexes real-time. What DB2 does not
maintain in real-time is a list of unique values for the first column of a
multiple column index that would enable it re-write the query with the
predicate that has 50 values hardcoded.

Also DB2 has very good tools to identify those cases (non-matching index
scans) where a index on city should be created for optimal performance.

Of course, as does everybody, and I also assume the corollorary as well,
identification of indexes that are not providing any particular useful
benefit (which in a DW can actually be a bigger problem).

Like I said, DB2 can use the index even if the first column of the index is
not in the predicate. It just can't use the b-tree of the index. The only
way DB2 could do a skip scan (to the best of my understanding) is to read
the entire index anyway, so there would be no benefit in DB2. If it works
better in Oracle, good for you.

Your statement about skip scan not be tested in TPC benchmarks is
misleading. The reason skip scan is not tested in the TPC benchmarks is only because the people who run the Oracle benchmarks set up the indexes
correctly to begin with, so that there are no non-matching index scans.

Right, because as I said, cost of storage is not a directly measured
factor in TPC-H (and I DID say directly, I'm aware it surfaces in the
overal cost and cost per query calculation). However, for real-world
VLDB customers, cost of storage often is, which is why skip-scan (and
bit mapped indexes, and data compression) are all very useful techniques.

I believe you are wrong on that. The time to load the data and create the
indexes is part of the TPC benchmark. So having a lot of indexes does
penalize the benchmark load time (which is important to data warehouse
customers).

But having run TPC-H benchmarks myself, the number of required indexes to
optimally run the TPC-H queries is quite small, probably smaller than a real
world environment.

There is nothing in the benchmark rules about indexes, they are set up
however the benchmark team desires to provide optimal performance. If the Oracle team wanted to show off its vaunted skip scan, they could create some non-optimal indexes definitions. Maybe you should suggest that to them.

See comment above.

I saw it, but I don't believe it.

Nov 12 '05 #17

Mark Townsend

> How does Oracle do it? If they maintain the list of

unique values of the first column in real-time, that is extra work that DB2
does not do. But I am not sure Oracle indexes are directly comparable to DB2
indexes anyway.

All you need is the number of distinct values in the leading columns,
not the values themselves. This number then drives the number and ranges
of the sub-index scans.

Nov 12 '05 #18

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40************@comcast.net...

How does Oracle do it? If they maintain the list of
unique values of the first column in real-time, that is extra work that DB2 does not do. But I am not sure Oracle indexes are directly comparable to DB2 indexes anyway.

All you need is the number of distinct values in the leading columns,
not the values themselves. This number then drives the number and ranges
of the sub-index scans.

How do you get the number without reading the entire index? I don't think
DB2 can do it.

But according to AK (who started this thread) with the following index:

CREATE INDEX PO_STATE_CITY ON POST_OFFICE(STATE, CITY);

Oracle takes the following query:

SELECT COUNT(*) FROM POST_OFFICE WHERE CITY='Springfield'

and automatically converts it to:

(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AK' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AL' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='Ar' AND
CITY='Springfield') +
.... and so on.

So according to the above, Oracle has to come up with all the unique values
of state to convert the query. So how does Oracle do this?

What I said is that it would not help DB2 to do that because DB2 would have
to read the entire index to come up with the state values to convert the
query. But DB2 can do the original query by reading the entire index. Upon
which you initiated personal attacks against me.

Nov 12 '05 #19

Mark Townsend

Oracle takes the following query:

SELECT COUNT(*) FROM POST_OFFICE WHERE CITY='Springfield'

and automatically converts it to:

(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AK' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AL' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='Ar' AND
CITY='Springfield') +
... and so on.

So according to the above, Oracle has to come up with all the unique values
of state to convert the query. So how does Oracle do this?
It doesn't. As I mentioned, you do not have to write the query this why,
or even have the optimizer rewrite the query this way.
What I said is that it would not help DB2 to do that because DB2 would have
to read the entire index to come up with the state values to convert the
query. But DB2 can do the original query by reading the entire index. Upon
which you initiated personal attacks against me.

The statement I reacted to was as below. The implication I took from
this was twofold
a) The requirement for skip scan is 'unusual'
b) Creating another index is a better solution

However, I do not believe either of these to be correct statements, and
took them as indication from you that the feature (particularly as
provided by Oracle) was basically useless.

If that was not your intent, and you meant that in your experience in
DB2 environments this was an usual requirement, and that with DB2
creating another index was a more optimal solution, then I unreservedly
apologize.
If you did have the unusual situation where it would be faster to do a skip
scan, the best solution is to create another index on city only. This would
be 50 times faster (in theory) than the skip scan solution.

Nov 12 '05 #20

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:74f_b.42156$4o.59575@attbi_s52...

So according to the above, Oracle has to come up with all the unique values of state to convert the query. So how does Oracle do this?

It doesn't. As I mentioned, you do not have to write the query this why,
or even have the optimizer rewrite the query this way.

I never said that the user had to write the query this way. But AK rather
clearly said that Oracle re-writes the query that way to optimize it. I
responded to AK, not to you.

What I said is that it would not help DB2 to do that because DB2 would have to read the entire index to come up with the state values to convert the
query. But DB2 can do the original query by reading the entire index. Upon which you initiated personal attacks against me.

The statement I reacted to was as below. The implication I took from
this was twofold
a) The requirement for skip scan is 'unusual'
b) Creating another index is a better solution

However, I do not believe either of these to be correct statements, and
took them as indication from you that the feature (particularly as
provided by Oracle) was basically useless.

If that was not your intent, and you meant that in your experience in
DB2 environments this was an usual requirement, and that with DB2
creating another index was a more optimal solution, then I unreservedly
apologize.

I never said the "requirement" was unusual (if you mean that an SQL
statement query only supplies the second column of a two column index in the
predicate). What I said is that it would unusual in DB2 for the second
(optimized) query to perform better in DB2 unless the index was extremely
large, and that there was no way for DB2 to get all of the state values
without reading the entire index anyway. I never said how that applies to
Oracle since I never said anything about how Oracle would perform.

I will stick by my statement that if performance was critical, creating a
second index on city was a better solution for DB2. I never said what would
be better for Oracle (which is why I don't answer questions in the Oracle
newsgroup).

I am still waiting to hear how Oracle gets all the state values without
reading the entire index. I am not saying that it is impossible, but it
seems to me that Oracle would have to maintain (in real-time) the list of
unique values for state. You say that Oracle only needs the number of unique
values, but how does it even know that without reading the entire index?
Maybe Oracle can do this better than DB2, but DB2 would have to read the
entire index to know the number of unique values of state.

However, I don't how the skip scan work without knowing all the state values
(not just the number of unique values). I was relying on AK description of
how the skip scan works, and AK clearly says that Oracle knows the unique
values of state to rewrite (optimize) the query.

Nov 12 '05 #21

Mark Townsend

I am still waiting to hear how Oracle gets all the state values without
reading the entire index.

I can't explain why Oracle doesn't need the state values without
actually explaining how this is done, and that's something I don't want
to do, for a number of reasons, least not being it's actually a pretty
clever algorithm. But basically, the index is divided into a number of
sub-indexes, based on the number of distinct values in the leading
column (which are gotten from statistics), and then each sub index is
scanned looking for a certain something that tells the DB it's not worth
scanning that sub-index anymore, at which stage the remainder of that
sub-index is skipped. And as long as it's costed properly, and the stars
align, then it works very well.

Nov 12 '05 #22

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...

I am still waiting to hear how Oracle gets all the state values without
reading the entire index.

I can't explain why Oracle doesn't need the state values without
actually explaining how this is done, and that's something I don't want
to do, for a number of reasons, least not being it's actually a pretty
clever algorithm. But basically, the index is divided into a number of
sub-indexes, based on the number of distinct values in the leading
column (which are gotten from statistics), and then each sub index is
scanned looking for a certain something that tells the DB it's not worth
scanning that sub-index anymore, at which stage the remainder of that
sub-index is skipped. And as long as it's costed properly, and the stars
align, then it works very well.

Since the statistics are not always up to date, I still don't see exactly
how this works. Overall, I suspect that DB2 indexes are more efficient to
maintain real-time, and Oracle indexes have some extra intelligence in them
that makes sub scans possible, but are slightly less efficient to maintain.
Whenever a new value is inserted in the Oracle index, it must create a new
section.

Because DB2 index pages are not split into sections and usually occupy
contiguous disk space, and because of prefetch functions that DB2 has, it
can read the entire index fairly quickly. It sounds like Oracle indexes are
split into sections based on the uniqueness of the first column. Of course
DB2 does have MDC indexes that may be similar, but I am not sure.

Overall, I think that Oracle and DB2 have equivalent performance. Because
the internal structure of the databases is so different, I think it is often
counter-productive to start comparing the internal workings of the databases
as if the features are interchangeable, because they are not.

Nov 12 '05 #23

Serge Rielau

Holy cow! Do we need a bucket of cold water here? ;-)
I didn't read through the hole thread here. Forgive me if I post repeat
info.

IBM Redbrick uses a feature called "index-skip-scan", not sure whether
that is the same.
The problem discussed here seems to be that in a full index scan,
prefetch or not, the index is expected to touch every leaf, even though
the index scan could skip obviously non applying subsection.
This is certainly no magic. One would do the same if one tried to look
up a name (trailing) in a phone book (index) and one don't remember the
town (leading column). You won't read the whole book. You will
"skip-scan" to each town, then look up the name and move to the next
town once you know you're past the desired name in that town.

I don't know whether DB2 does this or not. I also don't know whether the
Redbrick feature matches this description.

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab

Nov 12 '05 #24

MCPHEAL

>Thanks Mark - obviously, everybody at Oracle, and the OP, are complete

idiots, and we will withdraw this utterly useless capability
immediately, because everybody knows that if IBM doesn't have it, it
isn't worth having.

Like sequences.

Give IBM time they'll catch up sooner or later.

Nov 12 '05 #25

MCPHEAL

>I think it is often

counter-productive to start comparing the internal workings of the databases
as if the features are interchangeable, because they are not.

Agreed - we are talking about the DB2 family here aren't we ?

Nov 12 '05 #26

Serge,

IBM Redbrick uses a feature called "index-skip-scan", not sure whether
that is the same.
The problem discussed here seems to be that in a full index scan,
prefetch or not, the index is expected to touch every leaf, even though
the index scan could skip obviously non applying subsection.
This is certainly no magic. One would do the same if one tried to look
up a name (trailing) in a phone book (index) and one don't remember the
town (leading column). You won't read the whole book. You will
"skip-scan" to each town, then look up the name and move to the next
town once you know you're past the desired name in that town.

that's exactly what I was speaking about.
Thanks!

Nov 12 '05 #27

Mark A

> Serge,

IBM Redbrick uses a feature called "index-skip-scan", not sure whether
that is the same.
The problem discussed here seems to be that in a full index scan,
prefetch or not, the index is expected to touch every leaf, even though
the index scan could skip obviously non applying subsection.
This is certainly no magic. One would do the same if one tried to look
up a name (trailing) in a phone book (index) and one don't remember the
town (leading column). You won't read the whole book. You will
"skip-scan" to each town, then look up the name and move to the next
town once you know you're past the desired name in that town.

that's exactly what I was speaking about.
Thanks!

That's in the Redbrick database product, not DB2.

Nov 12 '05 #28

Jeremy Rickard

ak************@yahoo.com (AK) wrote in message news:<46**************************@posting.google. com>...

I apologize for not explaining what "skip scans" are.
Let's say there is an index
CREATE INDEX PO_STATE_CITY ON POST_OFFICE(STATE, SITY);
then a query
SELECT COUNT(*) FROM POST_OFFICE WHERE CITY='Springfield'
will scan the whole index, which is much better than scanning the
table.
However, in this particular situation another approach is much faster:
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AK' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AL' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='Ar' AND
CITY='Springfield') +

and so on.

The idea is simple: there is no need to start at the beginning of the
index, it's better to start at('AK','Springfield'). Once you are done
with this pair, it's faster to jump over to ('AL','Springfield') than
to continue scanning.

It this situation it is much faster to perform 50 index look-ups than
to scan the whole index.

I hope my explanations are clear this time.

I've tried to read up on this on several web sites. None gave more
than basic examples, but this was an OTN description that seemed quite
helpful:

"With Oracle9i, a composite index can be used even if the leading
column(s) are not accessed by the query, via a technique called an
'index skip scan'. During a skip scan, the composite index is accessed
once for each distinct value of the leading column(s). For each
distinct value, the index is searched to find the query's target
values. The result is a scan that skips across the index structure.

Index skip scans provide two main benefits. First, index skip scans
can improve the performance of certain queries, since queries which
previously required table scans may now be able to take advantage of
an existing index. Second, index skip scans may allow an application
to meet its performance goals with fewer indexes..."

May have misunderstood this (hence my question at end) but it seems to
be saying is that Oracle would: (1) scan the index for matches on the
non-leading column(s), (2) sort the qualifying combinations of leading
column values, removing duplicates, then (3) re-scan the index for
those de-duplicated leading columns values (presumably doing required
data page lookups at this point?).

DB2 on the other hand might: (1) scan the index for matches on the
non-leading column(s), (2) sort the qualifying RIDs (if needed, i.e.
if index is non-clustering), (3) use list prefetch to access the data
pages sequentially.

I think there are cases where non-leading column sorting would be
cheaper than RID sorting, and vice versa. But how does the Oracle
algorithm avoid random i/o on data pages? It seems the skip-scan
algorithm would work best on a clustering index?
Jeremy Rickard

Nov 12 '05 #29

Mark A

"Jeremy Rickard" <jr******@unisystems.biz> wrote in message
news:d3*************************@posting.google.co m...

I've tried to read up on this on several web sites. None gave more
than basic examples, but this was an OTN description that seemed quite
helpful:

"With Oracle9i, a composite index can be used even if the leading
column(s) are not accessed by the query, via a technique called an
'index skip scan'. During a skip scan, the composite index is accessed
once for each distinct value of the leading column(s). For each
distinct value, the index is searched to find the query's target
values. The result is a scan that skips across the index structure.

Index skip scans provide two main benefits. First, index skip scans
can improve the performance of certain queries, since queries which
previously required table scans may now be able to take advantage of
an existing index. Second, index skip scans may allow an application
to meet its performance goals with fewer indexes..."

May have misunderstood this (hence my question at end) but it seems to
be saying is that Oracle would: (1) scan the index for matches on the
non-leading column(s), (2) sort the qualifying combinations of leading
column values, removing duplicates, then (3) re-scan the index for
those de-duplicated leading columns values (presumably doing required
data page lookups at this point?).

DB2 on the other hand might: (1) scan the index for matches on the
non-leading column(s), (2) sort the qualifying RIDs (if needed, i.e.
if index is non-clustering), (3) use list prefetch to access the data
pages sequentially.

I think there are cases where non-leading column sorting would be
cheaper than RID sorting, and vice versa. But how does the Oracle
algorithm avoid random i/o on data pages? It seems the skip-scan
algorithm would work best on a clustering index?
Jeremy Rickard

There may be a few misconceptions about DB2.

1. First, if the there is a multi-column index, and the first column of the
index is not in the predicate, then DB2 can still use the index, it just
reads the index sequentially from top to bottom (normally with sequential
prefetch), instead of using the b-tree. Some of the explanation (including
the one above) suggest that Oracle cannot do this, but I have no personal
knowledge of this.

2. I don't believe that DB2 keeps track of where breaks in the first key
values exist in the index. To skip the non-relevant data and go the next
index entry that has a new unique value for the first column, the database
would have to keep track of these breaks or have (what someone else
described as) index sub-sections for each new value of the first column.
There must be some extra overhead maintaining sub-indexes compared to the
way DB2 does it (not keeping track of where new first column values start in
the index). Having index sub-sections could also potentially slow down
prefetches of the entire index (the way in which DB2 would do it).

3. Skip scan appears to only work when the second column of a multi-column
index is in the predicate (if I understand it correctly). If only the 3rd
column is supplied, then it will not work since their are only sub-indexes
on the first column. DB2 can scan the index regardless of how many leading
columns are missing in the predicate.

4. It is always much faster (even compared to skip scan) to have another
index on the second column, or to reverse the order of the columns on the
index (city, state) instead of (state, city). This is especially true as the
cardinality of the first column increases.

Nov 12 '05 #30

Fan Ruo Xin

"AK" <ak************@yahoo.com> wrote in message
news:46**************************@posting.google.c om...

I apologize for not explaining what "skip scans" are.
Let's say there is an index
CREATE INDEX PO_STATE_CITY ON POST_OFFICE(STATE, SITY);
then a query
SELECT COUNT(*) FROM POST_OFFICE WHERE CITY='Springfield'
will scan the whole index, which is much better than scanning the
table.
However, in this particular situation another approach is much faster:
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AK' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='AL' AND
CITY='Springfield') +
(SELECT COUNT(*) FROM POST_OFFICE WHERE STATE='Ar' AND
CITY='Springfield') +

and so on.

The idea is simple: there is no need to start at the beginning of the
index, it's better to start at('AK','Springfield'). Once you are done
with this pair, it's faster to jump over to ('AL','Springfield') than
to continue scanning.

It this situation it is much faster to perform 50 index look-ups than
to scan the whole index. =====
It is interesting ...
In fact, none of us know if db2 will use full index tree scan or index skip
scan. Or db2 optimizer did compute the cost of the two ways we discussed
here and choose the final one.

I hope my explanations are clear this time.

Nov 12 '05 #31

> =====

It is interesting ...
In fact, none of us know if db2 will use full index tree scan or index skip
scan. Or db2 optimizer did compute the cost of the two ways we discussed
here and choose the final one.

I've never seen a skip scan on DB2. I was looking at execution plans
and real execution costs

Nov 12 '05 #32

Mark A

"AK" <ak************@yahoo.com> wrote in message
news:46**************************@posting.google.c om...

=====
It is interesting ...
In fact, none of us know if db2 will use full index tree scan or index skip scan. Or db2 optimizer did compute the cost of the two ways we discussed
here and choose the final one.

I've never seen a skip scan on DB2. I was looking at execution plans
and real execution costs

Skip-scan doesn't exist in DB2. DB2 either reads the b-tree of the index or
the entire index. Not sure about indexes that define MDC. It seems to me
that having skip scan capability would impose a slight penalty on index
maintenance, even if it were never used.

Nov 12 '05 #33

Fan Ruo Xin

"AK" <ak************@yahoo.com> wrote in message
news:46**************************@posting.google.c om...

=====
It is interesting ...
In fact, none of us know if db2 will use full index tree scan or index skip scan. Or db2 optimizer did compute the cost of the two ways we discussed
here and choose the final one.

I've never seen a skip scan on DB2. I was looking at execution plans
and real execution costs

I thought Mark A might be correct. DB2 will use scan the whole index tree.
But I would like to save my word until we do enough research on V8.
Especially we can collect distribution statistics for specific columns in
Version8, like the following:
RUNSTATS ON TABLE mytbl ON KEY COLUMNS.

Any way, I thought it is a GOOD question. You may propose it to the IDUG on
May 9-13, if you plan to go there.

Nov 12 '05 #34

Fan Ruo Xin

"Mark A" <ma@switchboard.net> wrote in message
news:c1*****************@news.uswest.net...

"AK" <ak************@yahoo.com> wrote in message
news:46**************************@posting.google.c om...
=====
It is interesting ...
In fact, none of us know if db2 will use full index tree scan or index skip scan. Or db2 optimizer did compute the cost of the two ways we discussed here and choose the final one.
I've never seen a skip scan on DB2. I was looking at execution plans
and real execution costs

Skip-scan doesn't exist in DB2.

I don't mean to argue with you. Just wondering why you are so sure about
this?
DB2 either reads the b-tree of the index or
the entire index. Not sure about indexes that define MDC. Blocking index is also a B+ tree, other than store the RowID, for each index
entry, the index value followed by Block id(s). And there is some problem
with Index Only.
Almost forgot, when building a MDC table, db2 always build the composite
block index on all the dimensions and separate block index on each
dimension.
It seems to me
that having skip scan capability would impose a slight penalty on index
maintenance, even if it were never used. I can't understand why this will increase index maintenance. This will
increase the CPU workload during searching the index tree. But the basic
idea is saving I/O.

Nov 12 '05 #35

Mark A

"Fan Ruo Xin" <fa*****@sbcglobal.net> wrote in message
news:fD*****************@newssvr26.news.prodigy.co m...

Skip-scan doesn't exist in DB2.

I don't mean to argue with you. Just wondering why you are so sure about
this?

1. There is no access path description of skip-scan in the DB2 explain or in
any DB2 reference manual or IBM marketing materials.
2. DB2 has no way to do this without reading the whole index first (which
would be redundent). That is, regular DB2 indexes are not sub-sectioned the
way Mark Townsend described Oracle indexes.
3. None of the IBM Toronto Lab people on htis forum have said that sub-scan
exists in DB2 (although IBM Redbrick aparantly has it).

It seems to me
that having skip scan capability would impose a slight penalty on index
maintenance, even if it were never used.

I can't understand why this will increase index maintenance. This will
increase the CPU workload during searching the index tree. But the basic
idea is saving I/O.

In order to skip part of the index during a scan (skip-scan), and go to
where there is a new value for the first column in the index, DB2 would have
to know where the new value starts, and that position would have to be
maintained real-time, which incurs some resources, even if skip-scan is
never used.

Regular (non MDC) DB2 indexes do not have sub-sections based on the value of
the first column of the index, which apparently Oracle has in some or all of
its indexes. Maintaining the index in sub-sections has "some" performance
penalty when maintained in real-time. There may also a penalty if skip-scan
cannot be used and a full sequential index scan is needed, since indexes
that have sub-sections may not utilize sequential prefetch as efficiently.

The best way (IMO) to avoid full sequential index scans (when an index is
used but the leading column of an index is not in the predicate) is to add
the proper indexes required, or reverse the order of the columns on existing
indexes. This would perform much better than either a full sequential index
scan or a skip-scan. But if the index is not particularly large and/or the
query that causes the full sequential index scan is not frequent or is not
critical, then I might not even worry about it.

As I said. MDC indexes "may" be different, as you explained above, but I am
not really sure how MDC indexes work.

Nov 12 '05 #36

Mark Townsend

Mark A wrote:

That is, regular DB2 indexes are not sub-sectioned the
way Mark Townsend described Oracle indexes.

Oracle uses a b-tree structure for it's regular indexes (as opposed to
it's bit mapped indexes). I believe this is the same technique used by
DB2 but actually have no idea.

To be sure, interested parties can check out the description at
http://download-west.oracle.com/docs...a.htm#CNCPT811
See the section entitled 'Figure 5-7 Internal Structure of a B-tree Index'

I have yet to hear any complaints from customers regarding any
performance penalty of this type of structure.

Note that Oracle also does full index scans through the leaf blocks as
well (albeit in parallel, rather than sequential). Skip scan is only
executed if the costs show it adds a benefit.

I'm interested in the assertion that DB2 indexes are fundamentally
different - is there a primer on DB2 indexes somewhere ?

Last but not least, IBM Informix XPS also implements a type of skip-scan
but it sounds completely different once again - see
http://www-106.ibm.com/developerwork...weininger.html

Nov 12 '05 #37

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...

Mark A wrote:
That is, regular DB2 indexes are not sub-sectioned the
way Mark Townsend described Oracle indexes.
Oracle uses a b-tree structure for it's regular indexes (as opposed to
it's bit mapped indexes). I believe this is the same technique used by
DB2 but actually have no idea.

DB2 also uses a b-tree for its indexes (since before Oracle was born). But
the index leaf pages are not stored in sub-sections the way the described it
in one of your previous posts.
To be sure, interested parties can check out the description at
http://download-west.oracle.com/docs...a.htm#CNCPT811 See the section entitled 'Figure 5-7 Internal Structure of a B-tree Index'

I have yet to hear any complaints from customers regarding any
performance penalty of this type of structure.
The performance hit is slight, but there is no free lunch, and it occurs on
evey insert or update of the index key. But why incur any performance hit if
the sub-scan is not used much, and if a proper index structure (that has the
leading column in the predicate) is much faster than either a sub-scan or a
sequential index scan?
Note that Oracle also does full index scans through the leaf blocks as
well (albeit in parallel, rather than sequential). Skip scan is only
executed if the costs show it adds a benefit.
DB2 can do parallel I/O on the leaf page scan.
I'm interested in the assertion that DB2 indexes are fundamentally
different - is there a primer on DB2 indexes somewhere ?
Regular DB2 indexes are not stored in sub-sections as you stated that Oracle
indexes are (in a previous post). I am not sure about MDC indexes, which are
fairly new.
Last but not least, IBM Informix XPS also implements a type of skip-scan
but it sounds completely different once again - see
http://www-106.ibm.com/developerwork...weininger.html

Nov 12 '05 #38

Mark Townsend

Mark Townsend wrote:

Mark A wrote:
That is, regular DB2 indexes are not sub-sectioned the
way Mark Townsend described Oracle indexes.

To be sure, interested parties can check out the description at
http://download-west.oracle.com/docs...a.htm#CNCPT811

See the section entitled 'Figure 5-7 Internal Structure of a B-tree Index'
I'm interested in the assertion that DB2 indexes are fundamentally
different - is there a primer on DB2 indexes somewhere ?

Hmm -
http://www-306.ibm.com/cgi-bin/db2ww...n=c0005300.htm

Looks like a B-tree to me. Snap ?

Nov 12 '05 #39

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:ODW_b.56032$4o.75446@attbi_s52...

Looks like a B-tree to me. Snap ?

Yes, DB2 has used b-tree indexes since before Oracle became a product.

Nov 12 '05 #40

Mark Townsend

Mark A wrote:

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...
Mark A wrote:
That is, regular DB2 indexes are not sub-sectioned the
way Mark Townsend described Oracle indexes.

Oracle uses a b-tree structure for it's regular indexes (as opposed to
it's bit mapped indexes). I believe this is the same technique used by
DB2 but actually have no idea.

DB2 also uses a b-tree for its indexes (since before Oracle was born). But
the index leaf pages are not stored in sub-sections the way the described it
in one of your previous posts.

I think I see the confusion. I said
"the index is divided into a number of sub-indexes, based on the number
of distinct values in the leading column (which are gotten from
statistics), and then each sub index is scanned looking for a certain
something"

I did not mean physically divided (and stored). The division is logical
and performed at run time by the optimizer (same way the optimizer
decides ranges for parallel queries). Instead of sub-indexes, perhaps I
should have said "sub index ranges".

Apologies.

Nov 12 '05 #41

Mark Townsend

Mark A wrote:

Yes, DB2 has used b-tree indexes since before Oracle became a product.

Well I guess the old adage about old dogs and new tricks is true then.
Sorry, couldn't resist :-)

Nov 12 '05 #42

Mark A

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...

I think I see the confusion. I said
"the index is divided into a number of sub-indexes, based on the number
of distinct values in the leading column (which are gotten from
statistics), and then each sub index is scanned looking for a certain
something"

I did not mean physically divided (and stored). The division is logical
and performed at run time by the optimizer (same way the optimizer
decides ranges for parallel queries). Instead of sub-indexes, perhaps I
should have said "sub index ranges".

Apologies.

I still don't understand this. If the physical position (where there are
changes in the first column of the index) is not stored in the index or the
catalog, and maintained real-time, then the database would have to read the
all the values of the index at runtime. For DB2, once the entire index is
read, the rows can be filtered without any further work performed by DB2
(assuming that the predicate contains one of the columns in the index), so a
sub-scan would be redundant.

So that explanation does not make sense to me, and I suspect that there is
something more to it than this.

DB2 and Oracle have lots of nice features to make up for people who don't
know how to design databases. Each time one of these features is added, the
code gets bigger, the path length gets longer, the memory required gets
bigger, and everything gets a little bit slower (except for the one thing
the feature is trying to address).

Meanwhile people report that MySQL is several times faster than DB2 or
Oracle for simple queries.

Nov 12 '05 #43

PM $pm3iinc-nospam$

THe only skip scan i know of in db2 is the
Early Out on fetch first and maybe sometimes for min() max() processing.

For MDC, you have block indexes per dimension (state or whatever) and the
composite (combined block index keys).
It's like using X and Y coordinates (if 2 dimensions are involved) to get
the list of blocks that contain the required data pages.
Big block read (extent) and block-based bufferpools should be able to kick
in.

Terminology :

A dimension block index will be automatically created for each of the
dimensions specified, and it will be used by the optimizer to quickly and
efficiently access data along each dimension.

The set of blocks that contain pages with data having a certain key value of
one of the dimension block indexes is called a slice.

A composite block index will also automatically be created, containing all
dimension key columns, and will be used to maintain the clustering of data
over insert and update activity.
The composite block index can also be used by the optimizer to efficiently
access data having particular dimension values.

Every unique combination of dimension values form a logical cell

PM

Nov 12 '05 #44

Fan Ruo Xin

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...

I am still waiting to hear how Oracle gets all the state values without
reading the entire index.

I can't explain why Oracle doesn't need the state values without
actually explaining how this is done, and that's something I don't want
to do, for a number of reasons, least not being it's actually a pretty
clever algorithm. But basically, the index is divided into a number of
sub-indexes, based on the number of distinct values in the leading
column (which are gotten from statistics), and then each sub index is
scanned looking for a certain something that tells the DB it's not worth
scanning that sub-index anymore, at which stage the remainder of that
sub-index is skipped. And as long as it's costed properly, and the stars
align, then it works very well.

Just curious...
If I use AK's example again, but changed the following sql
SELECT COUNT(*) FROM POST_OFFICE WHERE CITY='Springfield'
to
SELECT COUNT(*) FROM POST_OFFICE WHERE STATE > 'AA' and CITY='Springfield'

Does ORACLE optimizer still choose to use index SKIP SCAN?

Nov 12 '05 #45

Fan Ruo Xin

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...

Of course this is completely transparent in Oracle - the example AK gave
you was to explain what is happening at the logical level. It doesn't
imply that the SQL needs to be written that way, or that the database is
even re-writing the SQL. Here's a published Oracle example, available on
otn.oracle.com. Assume the following table with 5,024,000 rows.

REGISTERED_OWNERS
(
STATE VARCHAR2(2) NOT NULL,
REGISTRATION# VARCHAR2(10) NOT NULL,
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
MAKE VARCHAR2(30),
MODEL VARCHAR2(15),
YEAR_MFG NUMBER,
CITY VARCHAR2(30),
ZIP NUMBER
)

with a single composite index on (STATE, REGISTRATION#)
Assume this query

SELECT first_name, last_name, zip
FROM REGISTERED_OWNERS WHERE registration# = '4FBB428'

In Oracle8i (which doesnt have skip scan), the resultant FTS would take
4035 hundreths.

In Oracle9i, the optimizer will cost (and in this case use) a sub-tree
lookup depending on the low and high values for the REGISTRATION# column
at a given STATE node. Comparable time - 25 hundreths of a second.

If this query is only executed spasmodically, then creating and
maintaining a second index on REGISTRATION# may be less than optimal.

At beginning, this example did impress me. But when I think again - the
index tree of this table (5m records) is a small one. Then I would rather
think DB2 and INFORMIX's readahead technology did do better than ORACLE's.
Does that make sense?

Nov 12 '05 #46

Mark A

"Fan Ruo Xin" <fa*****@sbcglobal.net> wrote in message
news:Ys******************@newssvr26.news.prodigy.c om...

"Mark Townsend" <ma***********@comcast.net> wrote in message
news:40**************@comcast.net...
Of course this is completely transparent in Oracle - the example AK gave
you was to explain what is happening at the logical level. It doesn't
imply that the SQL needs to be written that way, or that the database is
even re-writing the SQL. Here's a published Oracle example, available on
otn.oracle.com. Assume the following table with 5,024,000 rows.

REGISTERED_OWNERS
(
STATE VARCHAR2(2) NOT NULL,
REGISTRATION# VARCHAR2(10) NOT NULL,
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
MAKE VARCHAR2(30),
MODEL VARCHAR2(15),
YEAR_MFG NUMBER,
CITY VARCHAR2(30),
ZIP NUMBER
)

with a single composite index on (STATE, REGISTRATION#)
Assume this query

SELECT first_name, last_name, zip
FROM REGISTERED_OWNERS WHERE registration# = '4FBB428'

In Oracle8i (which doesnt have skip scan), the resultant FTS would take
4035 hundreths.

In Oracle9i, the optimizer will cost (and in this case use) a sub-tree
lookup depending on the low and high values for the REGISTRATION# column
at a given STATE node. Comparable time - 25 hundreths of a second.

If this query is only executed spasmodically, then creating and
maintaining a second index on REGISTRATION# may be less than optimal.

At beginning, this example did impress me. But when I think again - the
index tree of this table (5m records) is a small one. Then I would rather
think DB2 and INFORMIX's readahead technology did do better than ORACLE's.
Does that make sense?

For the above application, assuming that one can't create a second index
(which seem like a ridiculous and artificial restriction to me), I would
simply reverse the order of the index columns (REGISTRATION#, STATE).

I can't imagine that a predicate with STATE = 'XX' would be used much as a
single predicate. But even if so, and one wanted to cluster by STATE
(possibly a good idea) then the second index could be created on the STATE
column (by itself). This would be a fairly small index since it only has 2
bytes.

With a little bit of intelligent design, the lack of skip-scan be overcome,
in fact the proposed solution has far better performance than skip-scan.

Nov 12 '05 #47

Mark Townsend

> Then I would rather

think DB2 and INFORMIX's readahead technology did do better than ORACLE's.
Does that make sense?

Its horses for courses. This example is comparing a FTS vs a Skip Scan
index retrieval, but Oracle also does fast full index scans, so it's not
an either/or situation. Sometimes skip scan wins, sometime fast full
index scan wins. Skip scan tends to win when the cardinality of the
first column is low and the second is high (potentially the situation
where Mark A would recommend reversing the keys in the compound index).

Nov 12 '05 #48

Fan Ruo Xin

"Mark A" <ma@switchboard.net> wrote in message
news:b_******************@news.uswest.net...

"Fan Ruo Xin" <fa*****@sbcglobal.net> wrote in message
news:fD*****************@newssvr26.news.prodigy.co m...
Skip-scan doesn't exist in DB2. I don't mean to argue with you. Just wondering why you are so sure about
this?

1. There is no access path description of skip-scan in the DB2 explain or

in any DB2 reference manual or IBM marketing materials. That is true.
But I did know some feature which Version 8 introduced. But either I enable
it or not, I can't find any changing from explain output.
Another point is - some thing such as db2 prefetch technology changed from
one version to another version. Can you always find the corresponding
documentation in the db2 reference manual/article?
2. DB2 has no way to do this without reading the whole index first (which
would be redundent). That is, regular DB2 indexes are not sub-sectioned the way Mark Townsend described Oracle indexes. I don't think so. I would rather believe the sub-section is a VIRTUAL sub
index tree, not a PHYSICAL sub index tree.
3. None of the IBM Toronto Lab people on htis forum have said that sub-scan exists in DB2 (although IBM Redbrick aparantly has it).
It seems to me
that having skip scan capability would impose a slight penalty on index maintenance, even if it were never used. I can't understand why this will increase index maintenance. This will
increase the CPU workload during searching the index tree. But the basic
idea is saving I/O.

In order to skip part of the index during a scan (skip-scan), and go to
where there is a new value for the first column in the index, DB2 would

have to know where the new value starts, and that position would have to be
maintained real-time, which incurs some resources, even if skip-scan is
never used. I thought it doesn't need to be so complex. The most important thing is - at
which point, stop scan, and just jump to the next key of the leading column.

Regular (non MDC) DB2 indexes do not have sub-sections based on the value of the first column of the index, which apparently Oracle has in some or all of its indexes. Maintaining the index in sub-sections has "some" performance
penalty when maintained in real-time. There may also a penalty if skip-scan cannot be used and a full sequential index scan is needed, since indexes
that have sub-sections may not utilize sequential prefetch as efficiently. You can think of a block index as a regular B+ tree index.
Because I think the sub-section is a virtual part. There is no changing
during I/U/D.

The best way (IMO) to avoid full sequential index scans (when an index is
used but the leading column of an index is not in the predicate) is to add
the proper indexes required, or reverse the order of the columns on existing indexes. This would perform much better than either a full sequential index scan or a skip-scan. But if the index is not particularly large and/or the
query that causes the full sequential index scan is not frequent or is not
critical, then I might not even worry about it. If db2 does support skip scan. then the optimizer can take care of the whole
thing. The index tree size is one of the important thing which db2 optimizer
will pay attention to.
As I said. MDC indexes "may" be different, as you explained above, but I am not really sure how MDC indexes work.

Nov 12 '05 #49

Mark Townsend

SELECT COUNT(*) FROM POST_OFFICE WHERE STATE > 'AA' and CITY='Springfield'

Does ORACLE optimizer still choose to use index SKIP SCAN?

Of the top of my head,my initial reaction is probably not (although, the
more I look at it, possibly). Why don't you download and try ?

http://otn.oracle.com/software/produ...10g/index.html

Nov 12 '05 #50