Is there a rule for determining number of Page Cleaners?

Jean-David Beyer

I have six hard drives (4 SCSI and 2 EIDE) on my main machine with parts
of a database on each drive. The main index is on one SCSI drive all to
itself. The main data are on the other three SCSI drives. Small relations
are on one EIDE drive, and the logfiles are on the other EIDE drive. When
running the task, below, the rest of the machine is not doing much.

I do not remember where I saw it, but somewhere I got the idea that the
number of page cleaners might well be one more than the number of hard
drives, so I setup my database with 7 page cleaners (and 7 page fetchers).

Now a big (for me) task gets into a state where, on Linux, all seven page
cleaners are in the D state. It is all working, but slow. vmstat reveals
that the IO is going no where near the maximum rate that the IO channels
would support (about 6 Megabytes per second, even though the IO system can
easily do 36 Megabytes per second (e.g., when doing a database reorg); the
SCSI drives are 10,000rpm Ultra/320 SCSI drives and the SCSI controller is
on a PCI-X bus all its own.). The problem pretty clearly is the time the
drives spend seeking. Now perhaps I could increase the size of the buffer
pools, move stuff around on the drives, etc. In fact, the big hog is the
logging.

But what I would like to know is: is there any point in using more page
cleaners than I do now? Is there a better rule than the one I indicated above?

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 15:40:01 up 17 days, 5:00, 6 users, load average: 4.10, 4.16, 4.12

Nov 12 '05 #1

Subscribe Post Reply

2106

Knut Stolze

Jean-David Beyer wrote:

I have six hard drives (4 SCSI and 2 EIDE) on my main machine with parts
of a database on each drive. The main index is on one SCSI drive all to
itself. The main data are on the other three SCSI drives. Small relations
are on one EIDE drive, and the logfiles are on the other EIDE drive. When
running the task, below, the rest of the machine is not doing much. [...] In fact, the big hog is the
logging.

You should place the log on the fastest drive you have.

--
Knut Stolze
Information Integration
IBM Germany / University of Jena

Nov 12 '05 #2

Jean-David Beyer

Knut Stolze wrote:

Jean-David Beyer wrote:

I have six hard drives (4 SCSI and 2 EIDE) on my main machine with parts
of a database on each drive. The main index is on one SCSI drive all to
itself. The main data are on the other three SCSI drives. Small relations
are on one EIDE drive, and the logfiles are on the other EIDE drive. When
running the task, below, the rest of the machine is not doing much.

[...]
In fact, the big hog is the
logging.

You should place the log on the fastest drive you have.

I am sure you are right. But that does not answer the question on the
proper ratio of page cleaners to drive spindles.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 21:25:00 up 17 days, 10:45, 3 users, load average: 4.33, 4.20, 4.11

Nov 12 '05 #3

Serge Rielau

Jean-David Beyer wrote:

Knut Stolze wrote:
Jean-David Beyer wrote:

I have six hard drives (4 SCSI and 2 EIDE) on my main machine with parts
of a database on each drive. The main index is on one SCSI drive all to
itself. The main data are on the other three SCSI drives. Small
relations
are on one EIDE drive, and the logfiles are on the other EIDE drive.
When
running the task, below, the rest of the machine is not doing much.

[...]
In fact, the big hog is the
logging.

You should place the log on the fastest drive you have.

I am sure you are right. But that does not answer the question on the
proper ratio of page cleaners to drive spindles.

Can't answer that either (maybe Matt can jump in). But if you say the
log is the issue then the page cleaners won't help I guess.
BTW, would the configuration assistant be of any help?
Cheers
Serge

Nov 12 '05 #4

Jean-David Beyer

Serge Rielau wrote:

Jean-David Beyer wrote:
Knut Stolze wrote:
Jean-David Beyer wrote:
I have six hard drives (4 SCSI and 2 EIDE) on my main machine with
parts
of a database on each drive. The main index is on one SCSI drive all to
itself. The main data are on the other three SCSI drives. Small
relations
are on one EIDE drive, and the logfiles are on the other EIDE drive.
When
running the task, below, the rest of the machine is not doing much.

[...]

In fact, the big hog is the
logging.

You should place the log on the fastest drive you have.
I am sure you are right. But that does not answer the question on the
proper ratio of page cleaners to drive spindles.

Can't answer that either (maybe Matt can jump in). But if you say the
log is the issue then the page cleaners won't help I guess.

I guess so, too, but if all 7 are in D-state, they are a bottleneck. Now
if they are all in D-state because they are waiting for disk seeks, would
throughput improve with more of them?
BTW, would the configuration assistant be of any help?

I do not know what the configuration assistant is. I tried to run it, and
it wants:

$ db2ca
sh: line 1: /opt/IBMJava2-131/jre/bin/java: No such file or directory
DB2JAVIT : RC = 127

This is true enough. I would have thought a Java around a year old should
be old enough to use with the current product. Guess not.

trillian:jdbeyer[~]$ ls -l /opt/
total 48
drwxr-xr-x 3 root root 4096 Jan 1 15:06 IBM
drwxr-xr-x 8 root root 4096 Mar 1 2004 IBMJava2-141 <---<<<
drwxr-xr-x 2 root root 4096 Jan 1 13:30 IBM_PDF_DOCS
On my old machine, I found the Java stuff ran so slowly that it is
useless, so I either use C++ programs or the command line interpreter.
That was on a machine with 512Meg RAM and two Ultra/2 10,000rpm SCSI hard
drives and two 550MHz Pentium IIIs.

This machine has 4GBytes RAM, 2 Hyperthreaded 3.06GHz Xeons with 1
Megabyte L3 cache, four Ultra/320 10,000rpm SCSI hard drives. But I
haven't tried that stuff. I find the documentation almost impossible to
use with DB2 V8.1.?. Before I used DB2 V6.1.? and the documentation was
pretty clear.
--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 21:50:00 up 17 days, 11:10, 3 users, load average: 4.31, 4.12, 4.09

Nov 12 '05 #5

Matt Emmerton

"Jean-David Beyer" <jd*****@exit109.com> wrote in message
news:10*************@corp.supernews.com...

Serge Rielau wrote:
Jean-David Beyer wrote:
Knut Stolze wrote:
Jean-David Beyer wrote:

> I have six hard drives (4 SCSI and 2 EIDE) on my main machine with
> parts
> of a database on each drive. The main index is on one SCSI drive all to> itself. The main data are on the other three SCSI drives. Small
> relations
> are on one EIDE drive, and the logfiles are on the other EIDE drive.
> When
> running the task, below, the rest of the machine is not doing much.

[...]

> In fact, the big hog is the logging.

You should place the log on the fastest drive you have.

I am sure you are right. But that does not answer the question on the
proper ratio of page cleaners to drive spindles.
Can't answer that either (maybe Matt can jump in). But if you say the
log is the issue then the page cleaners won't help I guess.

<jumping in ; re-stating system details for context>

I didn't see what DB2 version you are using, so I'm going to give a general
answer with a bit of historical context, assuming v7 and v8 UDB.
I do not remember where I saw it, but somewhere I got the idea that
the number of page cleaners might well be one more than the number
of hard drives, so I setup my database with 7 page cleaners (and 7
page fetchers).
IIRC, this was the recommendation provided in the v7 performance tuning
guide.

Prior to v8, all page cleaner I/O was synchronous -- meaning that while an
I/O (write) was in progress, no other cleaning-related activity (aging
pages, figuring out what page to write next, etc) was being done by that
cleaner process. Thus, the recommendation was to have one cleaner per
physical disk (spindle), in order to keep all the drives busy with cleaning
activity.

In v8, we starting using AIO (asynchronous I/O), which allows each cleaner
process to fire off a batch of I/Os to the OS, and then go off and do
housekeeping work (aging pages and determining the next batch of pages to
clean) while the current batch of I/Os are in progress. This is a much more
efficient use of CPU (less context-switching and pre-emption), and scales
much better (especially on large systems with many arrays of disks.)

In v8, the rule of thumb is to use one cleaner per CPU.
Now a big (for me) task gets into a state where, on Linux, all
seven page cleaners are in the D state. It is all working, but slow.
vmstat reveals that the IO is going no where near the maximum rate that
the IO channels would support (about 6 Megabytes per second, even
though the IO system can easily do 36 Megabytes per second (e.g.,
when doing a database reorg); the SCSI drives are 10,000rpm
Ultra/320 SCSI drives and the SCSI controller is on a PCI-X bus
all its own.). The problem pretty clearly is the time the drives spend
seeking. Now perhaps I could increase the size of the buffer pools,
move stuff around on the drives, etc. In fact, the big hog is the
logging.

Note that the max throughput of your devices is based on sequential access.
Page cleaner access is usually quite random, due to the way pages are hashed
into slots in the bufferpool and the aging characteristics of pages based on
the workload. I'd say a 16% throughput rate is "pretty good" for a random
write workload.

[ Aside: To get a rough estimate of what throughput to expect on a random
workload, take the track-to-track seek time, and divide it by the average
seek time. This represents the access time ratio between a sequential
workload (which will seek to an adjacent track) and a random workload (which
will seek halfway across the disk, on average.) Apply this ratio to your
rated [sequential] throughput and you will have a good estimate of what
throughput to expect for a random workload. Scale as neccessary based on
read/write mix and sequential/random properties thereof. This usually works
out to be somewhere between 1/10 (10%) and 1/7 (14%). ]

Also, many of the memory structures (lists) used internally by the
bufferpool are based on the number of cleaners. One of the things we've
seen migrating workloads from v7 to v8 is that reducing the number of
cleaners can actually improve performance. Part of this is due to reduced
CPU contention (having 7 cleaners on a 4-way in v8 means that they will be
competing for CPU; whereas using 4 will promote fair sharing), and the async
nature of the cleaners in v8 allows the OS to make better decisions with
scheduling I/Os, as each cleaner will batch off N I/Os at the same time.

Hopefully this answers the posters questions.

--
Matt Emmerton
IBM Toronto Software Lab

Nov 12 '05 #6

Jean-David Beyer

Matt Emmerton wrote:
[snip]

<jumping in ; re-stating system details for context>

I didn't see what DB2 version you are using, so I'm going to give a
general answer with a bit of historical context, assuming v7 and v8
UDB.
I think I said, but if not, I am running DB2 UDB V8.1.7 at the moment on a
machine with two 3.06 GHz hyperthreaded Intel Xeon processors -- the ones
with 1 MByte L3 caches (Linux thinks there are four processors). I have 4
GBytes RAM. In the database of interest, most of the relations and their
indices are DMS.
I do not remember where I saw it, but somewhere I got the idea that
the number of page cleaners might well be one more than the number of
hard drives, so I setup my database with 7 page cleaners (and 7 page
fetchers).
IIRC, this was the recommendation provided in the v7 performance tuning
guide.

Prior to v8, all page cleaner I/O was synchronous -- meaning that while
an I/O (write) was in progress, no other cleaning-related activity
(aging pages, figuring out what page to write next, etc) was being done
by that cleaner process. Thus, the recommendation was to have one
cleaner per physical disk (spindle), in order to keep all the drives
busy with cleaning activity.

In v8, we starting using AIO (asynchronous I/O), which allows each
cleaner process to fire off a batch of I/Os to the OS, and then go off
and do housekeeping work (aging pages and determining the next batch of
pages to clean) while the current batch of I/Os are in progress. This
is a much more efficient use of CPU (less context-switching and
pre-emption), and scales much better (especially on large systems with
many arrays of disks.)

In v8, the rule of thumb is to use one cleaner per CPU.

So I might get better performance by reducing the number of page cleaners
(and probably page fetchers, though they are not a bottleneck) from the 7
I have now to 4 each, right?
Now a big (for me) task gets into a state where, on Linux, all seven
page cleaners are in the D state. It is all working, but slow. vmstat
reveals that the IO is going no where near the maximum rate that the
IO channels would support (about 6 Megabytes per second, even though
the IO system can easily do 36 Megabytes per second (e.g., when doing
a database reorg); the SCSI drives are 10,000rpm Ultra/320 SCSI
drives and the SCSI controller is on a PCI-X bus all its own.). The
problem pretty clearly is the time the drives spend seeking. Now
perhaps I could increase the size of the buffer pools, move stuff
around on the drives, etc. In fact, the big hog is the logging.

Note that the max throughput of your devices is based on sequential
access. Page cleaner access is usually quite random, due to the way
pages are hashed into slots in the bufferpool and the aging
characteristics of pages based on the workload. I'd say a 16%
throughput rate is "pretty good" for a random write workload.

[ Aside: To get a rough estimate of what throughput to expect on a
random workload, take the track-to-track seek time, and divide it by
the average seek time. This represents the access time ratio between a
sequential workload (which will seek to an adjacent track) and a random
workload (which will seek halfway across the disk, on average.) Apply
this ratio to your rated [sequential] throughput and you will have a
good estimate of what throughput to expect for a random workload. Scale
as neccessary based on read/write mix and sequential/random properties
thereof. This usually works out to be somewhere between 1/10 (10%) and
1/7 (14%). ]

According to Maxtor for the SCSI drives, seek times are

track-to-track: 0.3ms
average seek: 4.5ms
full stroke: 11. ms

So that comes to 0.3/4.5 = 0.0667 or about 7%.

Now the disk controllers can do 320MBytes/sec advertizing maximum, though
I to not recall seeing much over 55MBytes/sec in actual use over an
extended (> 1 second) period of time. When running a REORGANIZE DATABASE
on a DB2 database, I see 35 MBytes/second for extended (many seconds)
periods. But when just populating a database with INSERTs, it runs 6 to 12
MBytes/sec.

The low number is when setiathome is running at nice level 19, the high
number, when setiathome is not running. I attribute this to the fact that
DB2 must wait for IOs in this case, and while it is waiting, setiathome
dirtys up the caches in the processors so they run at RAM speed instead of
cache speed.

So without setiathome, according to this calculation, things should run at
about 21 MBytes/sec, not 12. That is less than a 50% error, so perhaps
that is as good as rules of thumb are likely to be.
Also, many of the memory structures (lists) used internally by the
bufferpool are based on the number of cleaners. One of the things
we've seen migrating workloads from v7 to v8 is that reducing the
number of cleaners can actually improve performance. Part of this is
due to reduced CPU contention (having 7 cleaners on a 4-way in v8 means
that they will be competing for CPU; whereas using 4 will promote fair
sharing), and the async nature of the cleaners in v8 allows the OS to
make better decisions with scheduling I/Os, as each cleaner will batch
off N I/Os at the same time.

Hopefully this answers the posters questions.

That answers a lot of them. Thank you.

It seems, therefore, that I should reduce the number of page cleaners (and
probably page fetchers) from 7 to 4. I should also (though that would be
more difficult, take the part of the database on one of the SCSI drives
and put it onto an EIDE drive, and move the log files to a SCSI drive.
That is more difficult and I will probably do that later.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 09:05:00 up 17 days, 22:25, 3 users, load average: 4.21, 4.21, 4.19

Nov 12 '05 #7

Matt Emmerton

"Jean-David Beyer" <jd*****@exit109.com> wrote in message
news:10*************@corp.supernews.com...

Matt Emmerton wrote:
[snip]
<jumping in ; re-stating system details for context>

I didn't see what DB2 version you are using, so I'm going to give a
general answer with a bit of historical context, assuming v7 and v8
UDB.

I think I said, but if not, I am running DB2 UDB V8.1.7 at the moment on a
machine with two 3.06 GHz hyperthreaded Intel Xeon processors -- the ones
with 1 MByte L3 caches (Linux thinks there are four processors). I have 4
GBytes RAM. In the database of interest, most of the relations and their
indices are DMS.
I do not remember where I saw it, but somewhere I got the idea that
the number of page cleaners might well be one more than the number of
hard drives, so I setup my database with 7 page cleaners (and 7 page
fetchers).

IIRC, this was the recommendation provided in the v7 performance tuning
guide.

Prior to v8, all page cleaner I/O was synchronous -- meaning that while
an I/O (write) was in progress, no other cleaning-related activity
(aging pages, figuring out what page to write next, etc) was being done
by that cleaner process. Thus, the recommendation was to have one
cleaner per physical disk (spindle), in order to keep all the drives
busy with cleaning activity.

In v8, we starting using AIO (asynchronous I/O), which allows each
cleaner process to fire off a batch of I/Os to the OS, and then go off
and do housekeeping work (aging pages and determining the next batch of
pages to clean) while the current batch of I/Os are in progress. This
is a much more efficient use of CPU (less context-switching and
pre-emption), and scales much better (especially on large systems with
many arrays of disks.)

In v8, the rule of thumb is to use one cleaner per CPU.

So I might get better performance by reducing the number of page cleaners
(and probably page fetchers, though they are not a bottleneck) from the 7
I have now to 4 each, right?

Correct. Anywhere between 2 and 4 cleaners (accounting for HTT) would be a
reasonable starting point on your system.

Now a big (for me) task gets into a state where, on Linux, all seven
page cleaners are in the D state. It is all working, but slow. vmstat
reveals that the IO is going no where near the maximum rate that the
IO channels would support (about 6 Megabytes per second, even though
the IO system can easily do 36 Megabytes per second (e.g., when doing
a database reorg); the SCSI drives are 10,000rpm Ultra/320 SCSI
drives and the SCSI controller is on a PCI-X bus all its own.). The
problem pretty clearly is the time the drives spend seeking. Now
perhaps I could increase the size of the buffer pools, move stuff
around on the drives, etc. In fact, the big hog is the logging.

Note that the max throughput of your devices is based on sequential
access. Page cleaner access is usually quite random, due to the way
pages are hashed into slots in the bufferpool and the aging
characteristics of pages based on the workload. I'd say a 16%
throughput rate is "pretty good" for a random write workload.

[ Aside: To get a rough estimate of what throughput to expect on a
random workload, take the track-to-track seek time, and divide it by
the average seek time. This represents the access time ratio between a
sequential workload (which will seek to an adjacent track) and a random
workload (which will seek halfway across the disk, on average.) Apply
this ratio to your rated [sequential] throughput and you will have a
good estimate of what throughput to expect for a random workload. Scale
as neccessary based on read/write mix and sequential/random properties
thereof. This usually works out to be somewhere between 1/10 (10%) and
1/7 (14%). ]

According to Maxtor for the SCSI drives, seek times are

track-to-track: 0.3ms
average seek: 4.5ms
full stroke: 11. ms

So that comes to 0.3/4.5 = 0.0667 or about 7%.

Now the disk controllers can do 320MBytes/sec advertizing maximum, though
I to not recall seeing much over 55MBytes/sec in actual use over an
extended (> 1 second) period of time. When running a REORGANIZE DATABASE
on a DB2 database, I see 35 MBytes/second for extended (many seconds)
periods. But when just populating a database with INSERTs, it runs 6 to 12
MBytes/sec.

The low number is when setiathome is running at nice level 19, the high
number, when setiathome is not running. I attribute this to the fact that
DB2 must wait for IOs in this case, and while it is waiting, setiathome
dirtys up the caches in the processors so they run at RAM speed instead of
cache speed.

So without setiathome, according to this calculation, things should run at
about 21 MBytes/sec, not 12. That is less than a 50% error, so perhaps
that is as good as rules of thumb are likely to be.

You also need to factor in the overhead of whatever read activity you are
doing as well -- so 12MB/sec may not be all that unreasonable.

The reason you see 35MB/sec while doing a reorganize is because DB2 does a
lot of sequential reads (which are quick) and the data that is written out
by the page cleaners is a lot less random than during a "normal" workload.

Also, many of the memory structures (lists) used internally by the
bufferpool are based on the number of cleaners. One of the things
we've seen migrating workloads from v7 to v8 is that reducing the
number of cleaners can actually improve performance. Part of this is
due to reduced CPU contention (having 7 cleaners on a 4-way in v8 means
that they will be competing for CPU; whereas using 4 will promote fair
sharing), and the async nature of the cleaners in v8 allows the OS to
make better decisions with scheduling I/Os, as each cleaner will batch
off N I/Os at the same time.

Hopefully this answers the posters questions.

That answers a lot of them. Thank you.

It seems, therefore, that I should reduce the number of page cleaners (and
probably page fetchers) from 7 to 4. I should also (though that would be
more difficult, take the part of the database on one of the SCSI drives
and put it onto an EIDE drive, and move the log files to a SCSI drive.
That is more difficult and I will probably do that later.

Agreed.

--
Matt Emmerton
IBM Toronto Software Lab

Nov 12 '05 #8

Is there a rule for determining number of Page Cleaners?

Similar topics