By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,743 Members | 804 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,743 IT Pros & Developers. It's quick & easy.

DB2 physical layout - comments needed

P: n/a
DB2 500G database, Wintel, heavily loaded OLTP (5M+ transactions a day;
all transactions are extremely small, all selects are controlled
(no ad-hoc), 99% of all selects are very small (no table
scans, index scans are very limited in size) ). Write performance
is generally more important than read performance, read performance
of heavy queries (the ones with table scans) is not important at all.

Question: how to spread data across physical disks to achieve
maximum performance?

Proposed layout: logs on separate mirror, system on separate
mirror, tempspace on separate mirror (?), all other disks
are 'data disks' organized as mirrors - see below. Tables are
divided into 'small' ones and 'big' ones. All small ones sit in
one tablespace that has associated bufferpool equal to tablespace
size (that should ensure 99.9999...% hit ratio). Big (historical)
ones have separate tablespaces for data and indexes, with
separate bufferpools. All tablespaces (both big and small)
are spread over all 'data disks' with each tablespace having
one container of equal size on each disk. Important: as
load is 99.9% random access opposed to sequential access,
to reduce contention at disk level it is proposed to have
PREFETCHSIZE exactly equal to EXTENTSIZE (!). As a result of this
configuration, each read operation is expected to affect exactly
one disk without touching neighbours; due to equal spreading
of tablespaces across all disks, load on every disk will
be statistically balanced.

Any comments (especially with explanations what seems to be wrong)
will be appreciated.
Nov 12 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
"Nobody" <ip*******@yahoo.com> wrote in message
news:e5**************************@posting.google.c om...
DB2 500G database, Wintel, heavily loaded OLTP (5M+ transactions a day;
all transactions are extremely small, all selects are controlled
(no ad-hoc), 99% of all selects are very small (no table
scans, index scans are very limited in size) ). Write performance
is generally more important than read performance, read performance
of heavy queries (the ones with table scans) is not important at all.

Question: how to spread data across physical disks to achieve
maximum performance?

Proposed layout: logs on separate mirror, system on separate
mirror, tempspace on separate mirror (?), all other disks
are 'data disks' organized as mirrors - see below. Tables are
divided into 'small' ones and 'big' ones. All small ones sit in
one tablespace that has associated bufferpool equal to tablespace
size (that should ensure 99.9999...% hit ratio). Big (historical)
ones have separate tablespaces for data and indexes, with
separate bufferpools. All tablespaces (both big and small)
are spread over all 'data disks' with each tablespace having
one container of equal size on each disk. Important: as
load is 99.9% random access opposed to sequential access,
to reduce contention at disk level it is proposed to have
PREFETCHSIZE exactly equal to EXTENTSIZE (!). As a result of this
configuration, each read operation is expected to affect exactly
one disk without touching neighbours; due to equal spreading
of tablespaces across all disks, load on every disk will
be statistically balanced.

Any comments (especially with explanations what seems to be wrong)
will be appreciated.


Given your application description, I would not expect much (if any)
prefetch activity to be happening. So I am not sure that I would worry about
making PREFETCHSIZE exactly equal to EXTENTSIZE. Prefetch is usually limited
to tablespace or index space scans.

Given the amount of buffer pool space for the "small" tables, you don't need
to worry much about data placement on the disk. But I would try to put
indexes on different disks than tables.
Nov 12 '05 #2

P: n/a
"Mark A" <ma@switchboard.net> wrote in message news:<Gx*****************@news.uswest.net>...
Thanks for the answer, Mark.
Given your application description, I would not expect much (if any)
prefetch activity to be happening. So I am not sure that I would worry about
making PREFETCHSIZE exactly equal to EXTENTSIZE. Prefetch is usually limited
to tablespace or index space scans. That was exactly the reasoning behind this decision.
Given the amount of buffer pool space for the "small" tables, you don't need
to worry much about data placement on the disk. But I would try to put
indexes on different disks than tables.

That was another design decision we needed to make: either to
allocate disks for specific tables/indexes, or to spread all
the tables/indexes over all disks. We decided in favor of the
latter because it provides much better balancing. Or you are
proposing to have 2 sets of disks - one for all indexes (with
all indexes spread over all disks of this set), and one
for data (with all data spread over all disks of this set)?
If yes - why do you this it is better then our model with
only one set of disks for both indexes/data?
Nov 12 '05 #3

P: n/a
> > Given the amount of buffer pool space for the "small" tables, you don't
need
to worry much about data placement on the disk. But I would try to put
indexes on different disks than tables.

That was another design decision we needed to make: either to
allocate disks for specific tables/indexes, or to spread all
the tables/indexes over all disks. We decided in favor of the
latter because it provides much better balancing. Or you are
proposing to have 2 sets of disks - one for all indexes (with
all indexes spread over all disks of this set), and one
for data (with all data spread over all disks of this set)?
If yes - why do you this it is better then our model with
only one set of disks for both indexes/data?


You don't need to have all indexes on one set of disks and tables on another
set. But if you could ensure that for a given table, the table data was on a
separate disk from the indexes for that table, that would be fine. But I
would not overkill this in situations were the bufferpool is as large as the
data.
Nov 12 '05 #4

P: n/a
AK
> mirror, tempspace on separate mirror (?), all other disks
I was wondering if there is any need for mirroring temporary data.
Maybe I'm missing something
Nov 12 '05 #5

P: n/a
ak************@yahoo.com (AK) wrote in message news:<46**************************@posting.google. com>...
mirror, tempspace on separate mirror (?), all other disks

I was wondering if there is any need for mirroring temporary data.

To prevent downtime in case of single HDD failure.
Nov 12 '05 #6

P: n/a
"Mark A" <ma@switchboard.net> wrote in message news:<5J**************@news.uswest.net>...
Given the amount of buffer pool space for the "small" tables, you don't need to worry much about data placement on the disk. But I would try to put
indexes on different disks than tables.

That was another design decision we needed to make: either to
allocate disks for specific tables/indexes, or to spread all
the tables/indexes over all disks. We decided in favor of the
latter because it provides much better balancing. Or you are
proposing to have 2 sets of disks - one for all indexes (with
all indexes spread over all disks of this set), and one
for data (with all data spread over all disks of this set)?
If yes - why do you this it is better then our model with
only one set of disks for both indexes/data?


You don't need to have all indexes on one set of disks and tables on another
set. But if you could ensure that for a given table, the table data was on a
separate disk from the indexes for that table, that would be fine. But I
would not overkill this in situations were the bufferpool is as large as the
data.

You're right - this shouldn't matter for 'small' tables. But what about
'large' tables - do you think it's better to separate indexes and data
(given that access in 99.9% random anyway)? If yes, how would you organize
disks - 2 separate disks for each 'large' table (one for data, one for indexes),
or just two sets of disks (one with data for all the tables, another for
indexes for all the tables)? And the most important question - why?

Thx in advance
Nov 12 '05 #7

P: n/a
> You're right - this shouldn't matter for 'small' tables. But what about
'large' tables - do you think it's better to separate indexes and data
(given that access in 99.9% random anyway)? If yes, how would you organize
disks - 2 separate disks for each 'large' table (one for data, one for indexes), or just two sets of disks (one with data for all the tables, another for
indexes for all the tables)? And the most important question - why?

Thx in advance


I am not sure if understand the options above, but here is what I originally
meant to say:

For a given table, the tablespace for the table and the tablespace for the
indexes should be on different disks (or different arrays). Please not that
the above applies to a PARTICULAR table. That does not mean that all tables
should be on one disk, and all indexes on another disk.

For example, 2 tables A and B, each with 1 index AX and BX:

Disk 1 - tablespace for A, tablespace for BX
Disk 2 - tablespace for B, tablespace for AX

The reason for this is that for an OLTP system, the index data and
corresponding table data are accessed almost simultaneously, so it could
theoretically help to have them on separate disks for a given table. Whether
or the difference is enough to worry about depends on many factors.

But for your large tables, I would have multiple containers on different
disks for each tablespace . Even if you don't have table space scans in your
application, this will help speed up utilities that access the entire table.
Nov 12 '05 #8

P: n/a
First off, and most importantly: Use hardware RAID-1 (based on Ultra-320 LVD
SCSI technology) rather than software mirroring, and make sure your RAID
hardware has a battery backed write cache (we got 7x DB2 performance
improvement merely by upgradinging the RAID card to one with a battery).
Also follow the advice in the DB2 documentation under "Optimizing table
space performance when data is on RAID devices" (this also addresses your
EXTENTSIZE and PREFETCHSIZE questions).

Next: ensure you have adequate RAM. Measure bufferpool hit cache rate, and
add RAM (whilst increasing buffer pool sizes) until you've reduced total I/O
to the minimum. BTW, be careful of increasing buffer pools too high for the
available RAM ==> measure system I/O and swapping.

Finally: spread the disk load over all disks. Enable disk performance
counters if you haven't already done so (enter the diskperf command at a
command prompt to see the current state; enable with diskperf -Y). Then
measure disk I/O, and move tablespaces around until you you have a nicely
balanced system. "Statistically balanced" (as you put it) is a meaningless
concept. The only way to know if you have balanced access is to measure it.
I have yet to see an a-priori analysis that bore any resemblance to measured
reality.

I'm still a believer in the old "1 table to a tablespace" rule, and have
found that increasing the number of bufferpools normally proportionally
decreases total throughput, hence I generally keep the number of bufferpools
to a minimum. Given your transaction load, I would probably not enable
parallel query. Spreading tablespaces over multiple volumes is important for
large tablespaces, but unless you database design consists of less large
tables than physical disks, generally can increase overhead rather than
reducing it. At 500GB data, and 73GB as a standard SCSI disk size (meaning
you have 7 RAID-0 pairs), you probably don't have such a configuration.

You claim that access is 99.9% random rather than "sequential". This is
highly unusual: normally access is localized to specific "hot spots" and is
far from uniformly random. I strongly recommend your measuring your system
to verify that access is truly random. If, as I suspect, it isn't, you can
tune for the effective access pattern.

"Nobody" <ip*******@yahoo.com> wrote in message
news:e5**************************@posting.google.c om...
DB2 500G database, Wintel, heavily loaded OLTP (5M+ transactions a day;
all transactions are extremely small, all selects are controlled
(no ad-hoc), 99% of all selects are very small (no table
scans, index scans are very limited in size) ). Write performance
is generally more important than read performance, read performance
of heavy queries (the ones with table scans) is not important at all.

Question: how to spread data across physical disks to achieve
maximum performance?

Proposed layout: logs on separate mirror, system on separate
mirror, tempspace on separate mirror (?), all other disks
are 'data disks' organized as mirrors - see below. Tables are
divided into 'small' ones and 'big' ones. All small ones sit in
one tablespace that has associated bufferpool equal to tablespace
size (that should ensure 99.9999...% hit ratio). Big (historical)
ones have separate tablespaces for data and indexes, with
separate bufferpools. All tablespaces (both big and small)
are spread over all 'data disks' with each tablespace having
one container of equal size on each disk. Important: as
load is 99.9% random access opposed to sequential access,
to reduce contention at disk level it is proposed to have
PREFETCHSIZE exactly equal to EXTENTSIZE (!). As a result of this
configuration, each read operation is expected to affect exactly
one disk without touching neighbours; due to equal spreading
of tablespaces across all disks, load on every disk will
be statistically balanced.

Any comments (especially with explanations what seems to be wrong)
will be appreciated.

Nov 12 '05 #9

P: n/a
> For a given table, the tablespace for the table and the tablespace for the
indexes should be on different disks (or different arrays). Please not that
the above applies to a PARTICULAR table. That does not mean that all tables
should be on one disk, and all indexes on another disk.

For example, 2 tables A and B, each with 1 index AX and BX:

Disk 1 - tablespace for A, tablespace for BX
Disk 2 - tablespace for B, tablespace for AX I see. Thanks.
The reason for this is that for an OLTP system, the index data and
corresponding table data are accessed almost simultaneously, so it could
theoretically help to have them on separate disks for a given table. Because otherwise normal request for A will require accessing
A and AX, which would mean moving HDD head back and forth, which
is not a good thing, right? But on the other hand, IO is supposed
to be random, so head re-positioning will likely happen anyway.
Plus if we will have the following configuration ('our' model):
Disk 1 - A-1 (1st container for A data), AX-1, B-1, BX-1, C-1, CX-1
Disk 2 - A-2, AX-2, B-2, BX-2, C-2, CX-2
Disk 3 - A-3, AX-3, B-3, BX-3, C-3, CX-3
Disk 4 - A-4, AX-4, B-4, BX-4, C-4, CX-4
Disk 5 - A-5, AX-5, B-5, BX-5, C-5, CX-5,
then random IO for index and data will likely (80% chance) go
to the different disks. And A-B join (and any other
join) it is going to have the same 80% chance of any 2 requests
going to different disks.

I don't want to say that your model is not going to work -
it will, but our model seems (to us) to be easier manageable and
expandable, plus it is balanced 'by design' (all the disks has
statistically equal load, so no overload of one of the disks is
going to happen), and we do like it, but it is not usually
recommended for high-performance systems, so we are trying to find
any potential problems with it. And the question is - do you see any
problems with 'our' model?

Thanks in advance
Whether
or the difference is enough to worry about depends on many factors.

But for your large tables, I would have multiple containers on different
disks for each tablespace . Even if you don't have table space scans in your
application, this will help speed up utilities that access the entire table.

Nov 12 '05 #10

P: n/a
"Mark Yudkin" <my***********************@nospam.org> wrote in message news:<c0*********@ngspool-d02.news.aol.com>...
Thanks for the answer, Mark.
First off, and most importantly: Use hardware RAID-1 (based on Ultra-320 LVD
SCSI technology) rather than software mirroring, and make sure your RAID
hardware has a battery backed write cache (we got 7x DB2 performance
improvement merely by upgradinging the RAID card to one with a battery). Both already done (and will remain the same in the future).
Also follow the advice in the DB2 documentation under "Optimizing table
space performance when data is on RAID devices" (this also addresses your
EXTENTSIZE and PREFETCHSIZE questions). Unfortunately, this guide refers to mixed OLTP/DSS load; in addition,
all their considerations doesn't apply to the bunch of RAID-1s, which is
essentially JBOD from the DB2 point of view.
Next: ensure you have adequate RAM. Measure bufferpool hit cache rate, and
add RAM (whilst increasing buffer pool sizes) until you've reduced total I/O
to the minimum. BTW, be careful of increasing buffer pools too high for the
available RAM ==> measure system I/O and swapping.

Finally: spread the disk load over all disks. Enable disk performance
counters if you haven't already done so (enter the diskperf command at a
command prompt to see the current state; enable with diskperf -Y). Then
measure disk I/O, and move tablespaces around until you you have a nicely
balanced system. Given the size of the database, experiments with moving tablespaces around
will take months - we don't have that much time (and after deployment
it is not an option at all - system is operated 24/7 with not more than
0.01% scheduled downtime allowed). In addition, original manual balance
will become worse and worse each time when new space requirement will arise
(and it will arise like in any real-life system); that is the main reason
we would prefer not to deal with manual balancing.

What we could and will do - test
2 (maximum 3) configurations (like 'all-tables-over-all-disks',
'1-table-per-disk', '1-table-data-plus-other-table-index' etc.);
unfortunately, we are time-limited, so we need to select configurations
to test carefully.
"Statistically balanced" (as you put it) is a meaningless
concept. The only way to know if you have balanced access is to measure it.
I have yet to see an a-priori analysis that bore any resemblance to measured
reality. It is done this way now - and it is statistically balanced: just
as it was predicted by a-priori analysis. When we designed current
database, we didn't have other options than spread all the tablespaces
over all the disks (due to the lack of HDDs), now we can get enough
disks but yet to see any advantage of disk-per-table model.
I'm still a believer in the old "1 table to a tablespace" rule, and have
found that increasing the number of bufferpools normally proportionally
decreases total throughput, Just FYI - separating bufferpools is a thing that keeps current
database afloat (serving purposes ranging from achieving
99.9999% hit ratio for small and frequently accessed tables to
avoiding bufferpool contamination by huge requests that accidentally
do happen).
hence I generally keep the number of bufferpools
to a minimum. Sure; definition of 'minimum' can differ though.
Given your transaction load, I would probably not enable
parallel query. Out of the question.
Spreading tablespaces over multiple volumes is important for
large tablespaces, This argument is definitely valid for DSS kind of load
with lots of sequential reads, but I didn't find any
proof for its applicability to 'pure' OLTP loads yet.
but unless you database design consists of less large
tables than physical disks, generally can increase overhead rather than
reducing it. Again, not clear whether it applies to 'pure' OLTP loads.
At 500GB data, and 73GB as a standard SCSI disk size (meaning
you have 7 RAID-0 pairs), you probably don't have such a configuration.

You claim that access is 99.9% random rather than "sequential". This is
highly unusual: normally access is localized to specific "hot spots" and is
far from uniformly random. Yes, it is not uniform, but it is random in the sense it is not
sequential (i.e. almost each IO requires HDD seek), therefore things
like prefetch wouldn't work.
I strongly recommend your measuring your system
to verify that access is truly random. If, as I suspect, it isn't, you can
tune for the effective access pattern.

Nov 12 '05 #11

P: n/a
> > I'm still a believer in the old "1 table to a tablespace" rule, and have
found that increasing the number of bufferpools normally proportionally
decreases total throughput,

Just FYI - separating bufferpools is a thing that keeps current
database afloat (serving purposes ranging from achieving
99.9999% hit ratio for small and frequently accessed tables to
avoiding bufferpool contamination by huge requests that accidentally
do happen).
hence I generally keep the number of bufferpools
to a minimum.

Sure; definition of 'minimum' can differ though.

2 bufferpools is fine.
Nov 12 '05 #12

P: n/a
"Nobody" <ip*******@yahoo.com> wrote in message
news:e5**************************@posting.google.c om...
Also follow the advice in the DB2 documentation under "Optimizing table
space performance when data is on RAID devices" (this also addresses your EXTENTSIZE and PREFETCHSIZE questions).

Unfortunately, this guide refers to mixed OLTP/DSS load; in addition,
all their considerations doesn't apply to the bunch of RAID-1s, which is
essentially JBOD from the DB2 point of view.


Huh? Here's a copy of the chapter I referenced:
Optimizing table space performance when data is on RAID devices
This section describes how to optimize performance when data is placed on
Redundant Array of Independent Disks (RAID) devices.

Procedure

You should do the following for each table space that uses a RAID device:

a.. Define a single container for the table space (using the RAID device).
b.. Make the EXTENTSIZE of the table space equal to, or a multiple of, the
RAID stripe size.
c.. Ensure that the PREFETCHSIZE of the table space is:
a.. the RAID stripe size multiplied by the number of RAID parallel
devices (or a whole multiple of this product), and
b.. a multiple of the EXTENTSIZE.
d.. Use the DB2_PARALLEL_IO registry variable to enable parallel I/O for
the table space.
DB2_PARALLEL_IO

When reading data from, or writing data to table space containers, DB2 may
use parallel I/O if the number of containers in the database is greater than
1. However, there are situations when it would be beneficial to have
parallel I/O enabled for single container table spaces. For example, if the
container is created on a single RAID device that is composed of more than
one physical disk, you may want to issue parallel read and write calls.

To force parallel I/O for a table space that has a single container, you can
use the DB2_PARALLEL_IO registry variable. This variable can be set to "*"
(asterisk), meaning every table space, or it can be set to a list of table
space IDs separated by commas. For example:

db2set DB2_PARALLEL_IO=* {turn parallel I/O on for all table
spaces}
db2set DB2_PARALLEL_IO=1,2,4,8 {turn parallel I/O on for table spaces 1,
2,
4, and 8}
After setting the registry variable, DB2 must be stopped (db2stop), and then
restarted (db2start), for the changes to take effect.

DB2_PARALLEL_IO also affects table spaces with more than one container
defined. If you do not set the registry variable, the I/O parallelism is
equal to the number of containers in the table space. If you set the
registry variable, the I/O parallelism is equal to the result of prefetch
size divided by extent size. You might want to set the registry variable if
the individual containers in the table space are striped across multiple
physical disks.

For example, a table space has two containers and the prefetch size is four
times the extent size. If the registry variable is not set, a prefetch
request for this table space will be broken into two requests (each request
will be for two extents). Provided that the prefetchers are available to do
work, two prefetchers can be working on these requests in parallel. In the
case where the registry variable is set, a prefetch request for this table
space will be broken into four requests (one extent per request) with a
possibility of four prefetchers servicing the requests in parallel.

In this example, if each of the two containers had a single disk dedicated
to it, setting the registry variable for this table space might result in
contention on those disks since two prefetchers will be accessing each of
the two disks at once. However, if each of the two containers was striped
across multiple disks, setting the registry variable would potentially allow
access to four different disks at once.

DB2_USE_PAGE_CONTAINER_TAG

By default, DB2 uses the first extent of each DMS container (file or device)
to store a container tag. The container tag is DB2's metadata for the
container. In earlier versions of DB2, the first page was used for the
container tag, instead of the first extent, and as a result less space in
the container was used to store the tag. (In earlier versions of DB2, the
DB2_STRIPED_CONTAINERS registry variable was used to create table spaces
with an extent sized tag. However, because this is now the default behavior,
this registry variable no longer has any affect.)

When the DB2_USE_PAGE_CONTAINER_TAG registry variable is set to ON, any new
DMS containers created will be created with a one-page tag, instead of a
one-extent tag (the default). There will be no impact to existing containers
that were created before the registry variable was set.

Setting this registry variable to ON is not recommended unless you have very
tight space constraints, or you require behavior consistent with pre-Version
8 databases.

Setting this registry variable to ON can have a negative impact on I/O
performance if RAID devices are used for table space containers. When using
RAID devices for table space containers, it is suggested that the table
space be created with an extent size that is equal to, or a multiple of, the
RAID stripe size. However, if this registry variable is set to ON, a
one-page container tag will be used and the extents will not line up with
the RAID stripes. As a result it may be necessary during an I/O request to
access more physical disks than would be optimal. Users are thus strongly
advised against setting this registry variable.

To create containers with one-page container tags, set this registry
variable to ON, and then stop and restart the instance:

db2set DB2_USE_PAGE_CONTAINER_TAG=ON
db2stop
db2start
To stop creating containers with one-page container tags, reset this
registry variable, and then stop and restart the instance.

db2set DB2_USE_PAGE_CONTAINER_TAG=
db2stop
db2start
The Control Center, the LIST TABLESPACE CONTAINERS command, and the GET
SNAPSHOT FOR TABLESPACES command do not show whether a container has been
created with a page or extent sized tag. They use the label "file" or
"device", depending on how the container was created. To verify whether a
container was created with a page- or extent-size tag, you can use the /DTSF
option of DB2DART to dump table space and container information, and then
look at the type field for the container in question. The query container
APIs (sqlbftcq and sqlbtcq), can be used to create a simple application that
will display the type.

----

Regarding "manual" balancing etc. I assumed that you would be doing
full-scale testing prior to going live, and thus would be able to perform
standard tuning. I was not aware that you would be going live with little
more than some theoretical considerations and no measurements of reality.
Hopefully you will at least do some full-scale testing on backup / restore /
recovery to ensure you meet your 24%7 at 99.99% availability.

As for prefetching, one needs to consider the question of access patterns
rather than randomness of the queries. Prefetching can bring advantages when
access patterns are correlated (not sequential) with the clustering index.
If you can define a clustering index that leads to "hot spots", then
prefetching can be very useful to preload the bufferpool(s).


Nov 12 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.