By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,852 Members | 994 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,852 IT Pros & Developers. It's quick & easy.

Direct Access Without Index ?

P: n/a
This is my first time here, so there may be earlier relevant threads of
which I am unaware. Though my experience with DB2 is not extensive,
such as it is was under OS/390 or equ. My main experience is IMS DB,
which leads to my question.

In IMS, there is an HDAM access method which can find a record without
using an index as such. At initial database load, it first formats the
entire space allocation into blocks of the given size. Then the
respective input records are stored according to their key value, that
is in a specific block (number) derived from that key by a "randomizer"
module that is aware of the total block-count. Thus, a later GU (get
unique) on that key will via the same randomizer again select that block
and find the record with the matching key.

Does DB2 have anything equivalent, or will it in the future? Can a
table somehow be allocated so the designated (unique) key is
direct-accessed without using an index?

Thanks,

Walter Rue
Nov 12 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a

"WalterR" <ws***@verizon.net> wrote in message
news:UWxUc.9474$k63.8620@trndny03...
This is my first time here, so there may be earlier relevant threads of
which I am unaware. Though my experience with DB2 is not extensive,
such as it is was under OS/390 or equ. My main experience is IMS DB,
which leads to my question.

In IMS, there is an HDAM access method which can find a record without
using an index as such. At initial database load, it first formats the
entire space allocation into blocks of the given size. Then the
respective input records are stored according to their key value, that
is in a specific block (number) derived from that key by a "randomizer"
module that is aware of the total block-count. Thus, a later GU (get
unique) on that key will via the same randomizer again select that block
and find the record with the matching key.

Does DB2 have anything equivalent, or will it in the future? Can a
table somehow be allocated so the designated (unique) key is
direct-accessed without using an index?

It sounds like you're not very happy about the prospect of using indexes and
I'm really not sure why. I think indexes are a good thing in DB2, not
something to avoid.

One of the best features of DB2 is the optimizer, which relieves you of the
tedious work of having to tell the program how to find your data. Instead of
having to tell your program how to navigate down complex hierarchies to find
your data, you just tell DB2 what you want and it figures out the best way
to do it, "automagically". (Actually, some very clever people with Ph.Ds
figured out how to make it *look* easy to people like us :-)

Many of the optimizer's options for accessing your data involve accessing
the data more quickly by using one or more indexes. Indexes can dramatically
decrease the time it takes to find the row or rows that you want. For
example, if you want to find the lowest or highest value in a column, DB2
can do this with a read of a single index page, assuming the column in
question is indexed. Naturally, other index methods are a little less
efficient but they still go very fast.

However, DB2 never guarantees that it will use an index as an access path
and you can't *force* it to use an index as an access path. In some cases,
DB2 will not use an index, even if it is available, if, in the optimizer's
estimation, it would be faster NOT to use the index. The most common example
of this is if the table is very small; in that case, DB2 normally reads all
the rows of the table because it knows it can read every row and discard the
ones that don't satisfy the query faster than it can read only the desired
rows via an index. So, in general, you can trust DB2's optimizer to find the
row in the quickest way possible, without having to tell it how to do so.
The fastest way will, in some cases, be via sequential prefetch, which is a
way that DB2 can access a large table without using indexes but still get
the data very quickly.

Of course, you still need to oversee DB2 to some extent. You will need to do
regular reorgs of the data, refresh the statistics used by the optimizer
periodically, and rebind packages/plans that might be affected by major
changes to the data. You also have to be vigilant about programming: badly
written SQL can sabotage your performance very easily. But if you do all of
that, you should have a system that is relatively trouble-free.

As to the specific issue of an HDAM-like access method in DB2, I am not
aware of one. Mind you, I haven't kept up with mainframe DB2 for the last
few years and may have missed out on an announcement that such a feature was
added. But I am not aware of such a thing on any of the DB2 platforms and
would be surprised if such a feature were added, since it doesn't seem to
jibe with the philosophy behind relational databases.

I've often heard it said that DB2 on its best day will never match IMS on
its best day, due to the fact that IMS uses pointers while DB2 does not. As
far as I know that remains true. (You could consult benchmarks at the IBM
website to verify that.) However, if you consider all the benefits of
relational databases over hierarchical ones, I think you'll find that
whatever performance hit you take is more than offset by the ease of using a
relational database. To consider just one of those benefits, have you ever
written an ad hoc query against an IMS database?

Back in my IMS days - which is over 15 years ago now - even the most trivial
IMS program potentially required DBDGENs, PSBGENs, PCBGENs, ACBGENs and
various other jobs. With DB2, I can go to SPUFI (on OS/390) or a Command
Window or Control Center (on Windows/Linux/Unix) and run queries without any
of that sort of preparation. That alone makes DB2 very valuable, in my view.

Rhino
Nov 12 '05 #2

P: n/a
>
Does DB2 have anything equivalent, or will it in the future? Can a
table somehow be allocated so the designated (unique) key is
direct-accessed without using an index?

Range-clustered tables introduced on DB2 V8 Fixpak 4(or Version 8.1.4)
may be equivalent of HDAM.
Nov 12 '05 #3

P: n/a
WalterR <ws***@verizon.net> wrote in message news:<UWxUc.9474$k63.8620@trndny03>...
This is my first time here, so there may be earlier relevant threads of
which I am unaware. Though my experience with DB2 is not extensive,
such as it is was under OS/390 or equ. My main experience is IMS DB,
which leads to my question.

In IMS, there is an HDAM access method which can find a record without
using an index as such. At initial database load, it first formats the
entire space allocation into blocks of the given size. Then the
respective input records are stored according to their key value, that
is in a specific block (number) derived from that key by a "randomizer"
module that is aware of the total block-count. Thus, a later GU (get
unique) on that key will via the same randomizer again select that block
and find the record with the matching key.

Does DB2 have anything equivalent, or will it in the future? Can a
table somehow be allocated so the designated (unique) key is
direct-accessed without using an index?

Thanks,

Walter Rue


No, DB2 does not have that feature. I believe that some other
"relational" databases have it, but of course, that feature is not
part of the relational architecture.

Since some other RDMS's have the feature, then there is always a
chance that DB2 will get it. However, because of the use of buffer
pools and cheap memory, the need for it much less than in the IMS
days.

DB2 does use random hashing in a partitioning environment to determine
what partition a particular row is on, but not to find the actual row
on disk.
Nov 12 '05 #4

P: n/a
HDAM is IMS's "potentially fastest" access method. Not using an index
and having its potential I/O and processing costs is the reason for
this. Achieving it is a delicate balance of many factors and requires a
DBA with multiple years of experience to balance them out to obtain it.
Unfortunately, this performance is achieved only for an application that
requires access to a single database record (not including the potential
effects of a sequential randomizer).

Accessing large numbers of records, in multiple sequences, with HDAM, is
usually so poor that the common technique is to read the needed data
from all records in the database, convert it to a flat file, and sort
it. Using a clustering index allows one sequence to access the database
directly. Applications needing the data in another sequence are more
complex to write because large quantities of data cannot be extracted
with random I/O in reasonable amounts of time.

HDAM has many inherent "bad" consequences of its own. The most obvious
is that ALL needed space must be preallocated before loading data. Space
cannot be added at a later time because this changes the randomized
locations of individual records. Synonym chains, especially if the
chained record is located in another Control Interval (CI), frequently
degrade performance to no better than using an index. I've seen this
occur even when a small fraction of the records are chained.

Give me a couple of months to analyze an application's data needs and
access requirements and I'll design you an HDAM database that should
easily outperform DB2 for online access. Some of the downside to this is:
1. I'll need to do a LOT of monitoring to prevent performance degredation.
2. I'll need application developers and maintainers with 5+ years of
DL/I coding experience to retrieve this data without loosing the
performance I've built into the database.
3. If I EVER need to adjust the randomizer parameters; I'll have to shut
down the database and use an additional purchased product to reload it
into an expanded space in a reasonable amount of time. The same
purchased product or some very sophisticated assembler code and a couple
of application programs will be needed to load the data in a reasonable
amount of time.
4. Changes to the application's data requirements over time (almost
guaranteed in today's business environment) will require developing new
data access paths with matching recoding of the programs.
Rhino's comments hit the nail on the head. It's much easier to design a
relational database to hold the data and let the database manager figure
out how to access it. I believe (without checking the current
performance stats) that a well designed and tuned DB2 system should be
able to match or outperform DL/I indexed databases with a much smaller
workload on the DBA and application developers. HDAM has the potential
to outperform DB2 but I believe that, in most cases, the long term costs
of doing so are higher than the performance cost of using DB2.
Phil Sherman

Rhino wrote:
"WalterR" <ws***@verizon.net> wrote in message
news:UWxUc.9474$k63.8620@trndny03...
This is my first time here, so there may be earlier relevant threads of
which I am unaware. Though my experience with DB2 is not extensive,
such as it is was under OS/390 or equ. My main experience is IMS DB,
which leads to my question.

In IMS, there is an HDAM access method which can find a record without
using an index as such. At initial database load, it first formats the
entire space allocation into blocks of the given size. Then the
respective input records are stored according to their key value, that
is in a specific block (number) derived from that key by a "randomizer"
module that is aware of the total block-count. Thus, a later GU (get
unique) on that key will via the same randomizer again select that block
and find the record with the matching key.

Does DB2 have anything equivalent, or will it in the future? Can a
table somehow be allocated so the designated (unique) key is
direct-accessed without using an index?


It sounds like you're not very happy about the prospect of using indexes and
I'm really not sure why. I think indexes are a good thing in DB2, not
something to avoid.

One of the best features of DB2 is the optimizer, which relieves you of the
tedious work of having to tell the program how to find your data. Instead of
having to tell your program how to navigate down complex hierarchies to find
your data, you just tell DB2 what you want and it figures out the best way
to do it, "automagically". (Actually, some very clever people with Ph.Ds
figured out how to make it *look* easy to people like us :-)

Many of the optimizer's options for accessing your data involve accessing
the data more quickly by using one or more indexes. Indexes can dramatically
decrease the time it takes to find the row or rows that you want. For
example, if you want to find the lowest or highest value in a column, DB2
can do this with a read of a single index page, assuming the column in
question is indexed. Naturally, other index methods are a little less
efficient but they still go very fast.

However, DB2 never guarantees that it will use an index as an access path
and you can't *force* it to use an index as an access path. In some cases,
DB2 will not use an index, even if it is available, if, in the optimizer's
estimation, it would be faster NOT to use the index. The most common example
of this is if the table is very small; in that case, DB2 normally reads all
the rows of the table because it knows it can read every row and discard the
ones that don't satisfy the query faster than it can read only the desired
rows via an index. So, in general, you can trust DB2's optimizer to find the
row in the quickest way possible, without having to tell it how to do so.
The fastest way will, in some cases, be via sequential prefetch, which is a
way that DB2 can access a large table without using indexes but still get
the data very quickly.

Of course, you still need to oversee DB2 to some extent. You will need to do
regular reorgs of the data, refresh the statistics used by the optimizer
periodically, and rebind packages/plans that might be affected by major
changes to the data. You also have to be vigilant about programming: badly
written SQL can sabotage your performance very easily. But if you do all of
that, you should have a system that is relatively trouble-free.

As to the specific issue of an HDAM-like access method in DB2, I am not
aware of one. Mind you, I haven't kept up with mainframe DB2 for the last
few years and may have missed out on an announcement that such a feature was
added. But I am not aware of such a thing on any of the DB2 platforms and
would be surprised if such a feature were added, since it doesn't seem to
jibe with the philosophy behind relational databases.

I've often heard it said that DB2 on its best day will never match IMS on
its best day, due to the fact that IMS uses pointers while DB2 does not. As
far as I know that remains true. (You could consult benchmarks at the IBM
website to verify that.) However, if you consider all the benefits of
relational databases over hierarchical ones, I think you'll find that
whatever performance hit you take is more than offset by the ease of using a
relational database. To consider just one of those benefits, have you ever
written an ad hoc query against an IMS database?

Back in my IMS days - which is over 15 years ago now - even the most trivial
IMS program potentially required DBDGENs, PSBGENs, PCBGENs, ACBGENs and
various other jobs. With DB2, I can go to SPUFI (on OS/390) or a Command
Window or Control Center (on Windows/Linux/Unix) and run queries without any
of that sort of preparation. That alone makes DB2 very valuable, in my view.

Rhino


Nov 12 '05 #5

P: n/a
Tokunaga T. wrote:
Does DB2 have anything equivalent, or will it in the future? Can a
table somehow be allocated so the designated (unique) key is
direct-accessed without using an index?


Range-clustered tables introduced on DB2 V8 Fixpak 4(or Version 8.1.4)
may be equivalent of HDAM.

Indeed, indeed. The option in CREATE TABLE is
ORGANIZE BY KEY SEQUENCE.
It's essentially a straight mapping of an n-dimensional array onto disk.
Same as what I deduce HDAM seems to require these tables need to
allocate space up front (with an option to overflow into a standard heap
organized table). CREATE TABLE may take longer than usual ;-)

On another word once you have a row several DBMS out there allow you to
refer to the row by a ROWID which contains the physical location.
Very bad habit, but it creeps in just like GOTO into procedural languages.
DB2 UDB for LUW has so far been holding out, but both DB2 z/OS and IDS
have the "feature". Takes no rocket scientist to predict where that is
going.

Cheers
Serge
Nov 12 '05 #6

P: n/a
Thanks, everyone! I feel like I have just taken a short course.

I'm doing a data model currently normalized to about six tables, with
the "root" having an arbitrarily unique key and with the others, when
they occur, matching it. I was thinking that it should be an
entry-sequence number (binary), which prompted the HDAM concept.
However, the DBA would make the final decision, of course. From the
concensus here, the best practice would still use indexes (I'm not sure
what DB2 version we are running or if range-clustering even is an option).

Walter Rue
WalterR wrote:
This is my first time here, so there may be earlier relevant threads of
which I am unaware. Though my experience with DB2 is not extensive,
such as it is was under OS/390 or equ. My main experience is IMS DB,
which leads to my question.

In IMS, there is an HDAM access method which can find a record without
using an index as such. At initial database load, it first formats the
entire space allocation into blocks of the given size. Then the
respective input records are stored according to their key value, that
is in a specific block (number) derived from that key by a "randomizer"
module that is aware of the total block-count. Thus, a later GU (get
unique) on that key will via the same randomizer again select that block
and find the record with the matching key.

Does DB2 have anything equivalent, or will it in the future? Can a
table somehow be allocated so the designated (unique) key is
direct-accessed without using an index?

Thanks,

Walter Rue

Nov 12 '05 #7

P: n/a
Range clustering is heavy on teh memory. You woudl only do it if the
table is either very full or small enough that you don't mind the overhead.
They are good for look-up tables.
Like, say a table with stats on the States of the US.
Things that can be numbered and don't change a lot in terms of their
cardinality.

Cheers
Serge
Nov 12 '05 #8

P: n/a
WalterR <ws***@verizon.net> wrote in message news:<6y0Vc.47507$US4.33862@trndny01>...
Thanks, everyone! I feel like I have just taken a short course.

I'm doing a data model currently normalized to about six tables, with
the "root" having an arbitrarily unique key and with the others, when
they occur, matching it. I was thinking that it should be an
entry-sequence number (binary), which prompted the HDAM concept.
However, the DBA would make the final decision, of course. From the
concensus here, the best practice would still use indexes (I'm not sure
what DB2 version we are running or if range-clustering even is an option).

Walter Rue

If the unique key is defined as such in DB2 (a Primary Key or a Unique
Contrainst) DB2 will automatically create a unique index if one does
not already exist (some DBA's like to create the index manually and
then alter the table to add the the contrainsts).
Nov 12 '05 #9

P: n/a
Serge Rielau <sr*****@ca.eye-bee-em.com> wrote in message news:<bx*************@news04.bloor.is.net.cable.ro gers.com>...

On another word once you have a row several DBMS out there allow you to
refer to the row by a ROWID which contains the physical location.
Very bad habit, but it creeps in just like GOTO into procedural languages.
DB2 UDB for LUW has so far been holding out, but both DB2 z/OS and IDS
have the "feature".
Takes no rocket scientist to predict where that is going.
down the hidyhole called "XML Database"?
<G>

Cheers
Serge

Nov 12 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.