473,386 Members | 1,832 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Logging speed.

When initially populating a database, it runs slower that I would have
supposed. Checking here and there, especially with iostat, reveals that
most of the time is spent writting the logfiles. In other words, I am not
compute limited, but I am not IO limited either (in terms of bytes/second
to the drives). The system is in IO-WAIT state most of the time, so I
assume the process is really seek limited.

It is true that my method of populating the database leaves a lot to be
desired. The input is many hundreds of spreadsheet pages with hundreds of
columns and several thousand rows each. I do not want relations with
hundreds of columns, so to normalize them, I read a row at a time and do a
lot of inserts of subsets of the columns into different relations.

When all is done, everything is fine.

The large tables in the database, and the indices are DMS on 10,000rpm
Ultra/320 LVD SCSI hard drives. The logfiles go on a 7,200rpm 100MHz EIDE
hard drive with 8 megabyte cache in the drive. Possibly putting that onto
one of the SCSI drives might help slightly, but I doubt it because it
would have to seek from the data partition to the log partition and that
would probably lose me any savings from the increased rotational speed.

Any suggestions? Fortunately, this is not an urgent problem, but it would
be nice to understand DB2 well enough to assure myself that this is about
as fast as I should want.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 16:10:00 up 1 day, 7:45, 3 users, load average: 1.32, 1.14, 1.04

Nov 12 '05 #1
9 1618
In article <10*************@corp.supernews.com>, Jean-David Beyer
(jd*****@exit109.com) says...
Any suggestions? Fortunately, this is not an urgent problem, but it would
be nice to understand DB2 well enough to assure myself that this is about
as fast as I should want.


You might need to change the number of log pages in the log buffer. I
tested it some months ago just to find out how fast inserts would go
on my machine and it was about 5 times faster after changing it.
With 'db2 update db cfg for <dbname> using LOGBUFSZ <newvalue>' you
can change it, on WXP the default is 8.
Nov 12 '05 #2
Ian
Jean-David Beyer wrote:
When initially populating a database, it runs slower that I would have
supposed. Checking here and there, especially with iostat, reveals that
most of the time is spent writting the logfiles. In other words, I am
not compute limited, but I am not IO limited either (in terms of
bytes/second to the drives). The system is in IO-WAIT state most of the
time, so I assume the process is really seek limited.

It is true that my method of populating the database leaves a lot to be
desired. The input is many hundreds of spreadsheet pages with hundreds
of columns and several thousand rows each. I do not want relations with
hundreds of columns, so to normalize them, I read a row at a time and do
a lot of inserts of subsets of the columns into different relations.

When all is done, everything is fine.

The large tables in the database, and the indices are DMS on 10,000rpm
Ultra/320 LVD SCSI hard drives. The logfiles go on a 7,200rpm 100MHz
EIDE hard drive with 8 megabyte cache in the drive. Possibly putting
that onto one of the SCSI drives might help slightly, but I doubt it
because it would have to seek from the data partition to the log
partition and that would probably lose me any savings from the increased
rotational speed.

Any suggestions? Fortunately, this is not an urgent problem, but it
would be nice to understand DB2 well enough to assure myself that this
is about as fast as I should want.


There are lots of things you can do to affect this, such as:

1) Alter your transaction (i.e. autocommit off)
2) LOGBUFSZ (as suggested by Gert)
3) Consider LOGFILSIZ and LOGPRIMARY
4) Depending on your bufferpools, having the logs on the faster disk
may not be as much of a penalty as you think. Hopefully the data
and index writes are asynchronous.
5) Consider writing to flat files and then using LOAD.
Good luck
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----
Nov 12 '05 #3
For an initial populate from an external source, you could try making
the tables NOT LOGGED INITIALLY. This will eliminate all logging.

You'll have to back up the tablespaces after load and before allowing
updates. It's unlikely that you would need the logs because an early
failure would probably require a data reload from external sources
anyway. This will require seting the NOT LOGGED attribute for all of the
tables and loading all of them in a single unit of work (no commits).

Phil Sherman
Jean-David Beyer wrote:
When initially populating a database, it runs slower that I would have
supposed. Checking here and there, especially with iostat, reveals that
most of the time is spent writting the logfiles. In other words, I am
not compute limited, but I am not IO limited either (in terms of
bytes/second to the drives). The system is in IO-WAIT state most of the
time, so I assume the process is really seek limited.

It is true that my method of populating the database leaves a lot to be
desired. The input is many hundreds of spreadsheet pages with hundreds
of columns and several thousand rows each. I do not want relations with
hundreds of columns, so to normalize them, I read a row at a time and do
a lot of inserts of subsets of the columns into different relations.

When all is done, everything is fine.

The large tables in the database, and the indices are DMS on 10,000rpm
Ultra/320 LVD SCSI hard drives. The logfiles go on a 7,200rpm 100MHz
EIDE hard drive with 8 megabyte cache in the drive. Possibly putting
that onto one of the SCSI drives might help slightly, but I doubt it
because it would have to seek from the data partition to the log
partition and that would probably lose me any savings from the increased
rotational speed.

Any suggestions? Fortunately, this is not an urgent problem, but it
would be nice to understand DB2 well enough to assure myself that this
is about as fast as I should want.


Nov 12 '05 #4
Gert van der Kooij wrote:
In article <10*************@corp.supernews.com>, Jean-David Beyer
(jd*****@exit109.com) says...

Any suggestions? Fortunately, this is not an urgent problem, but it would
be nice to understand DB2 well enough to assure myself that this is about
as fast as I should want.


You might need to change the number of log pages in the log buffer. I
tested it some months ago just to find out how fast inserts would go
on my machine and it was about 5 times faster after changing it.
With 'db2 update db cfg for <dbname> using LOGBUFSZ <newvalue>' you
can change it, on WXP the default is 8.


On RHEL 3 ES, the default is also 8. I raised it to 32, dropped and
re-created the database and started populating it. It seems about 12%
faster, based on the fact that chunks of work used to take slightly over 8
seconds, and now they are taking 6.6 seconds. Now the bottleneck seems to
be the index that is on a hard drive all its own. It might pay to split
the index onto two hard drives, but I am not sure how. I already have 6
hard drives on this machine, and I am concerned about seek time, since I
am by no means saturating the IO channels (Ultra/320 LVD SCSI controller
with just the four SCSI drives on it, and I was writing around 8
MBytes.sec most of the time. If I do a reorg of a database that really
needs it, I can see over 33MBytes/sec for extended periods and 50
MBytes/sec peaks.

Here is the output of iostat. I removed the reads, since it was not doing
any. And I removed some other redundant data as well to shorten the lines.

Device: w/s wkB/s avgrq-sz avgqu-sz await svctm %util
/dev/hda10 1733.24 6954.02 8.02 3.19 0.19 0.18 31.86
/dev/hdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
/dev/sda1 26.59 106.37 8.00 0.03 0.10 0.10 0.28
/dev/sdb1 9.42 37.67 8.00 0.03 0.29 0.29 0.28
/dev/sdc1 28.53 114.13 8.00 0.17 0.58 0.58 1.66
/dev/sdd2 590.86 2364.54 8.00 96.57 16.26 2.34 138.50

hda10 has the logfiles on it.
hdb1 has unimportant small (under 10K rows) tables and indices on it.
sda1, sdb1, sdc1 have the (important: large busy) data on it.
sdd2 has the (important -large busy) indices on it.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 18:20:00 up 1 day, 9:55, 5 users, load average: 10.25, 10.12, 9.23

Nov 12 '05 #5
Ian wrote:
Jean-David Beyer wrote:
When initially populating a database, it runs slower that I would have
supposed. Checking here and there, especially with iostat, reveals
that most of the time is spent writting the logfiles. In other words,
I am not compute limited, but I am not IO limited either (in terms of
bytes/second to the drives). The system is in IO-WAIT state most of
the time, so I assume the process is really seek limited.

It is true that my method of populating the database leaves a lot to
be desired. The input is many hundreds of spreadsheet pages with
hundreds of columns and several thousand rows each. I do not want
relations with hundreds of columns, so to normalize them, I read a row
at a time and do a lot of inserts of subsets of the columns into
different relations.

When all is done, everything is fine.

The large tables in the database, and the indices are DMS on 10,000rpm
Ultra/320 LVD SCSI hard drives. The logfiles go on a 7,200rpm 100MHz
EIDE hard drive with 8 megabyte cache in the drive. Possibly putting
that onto one of the SCSI drives might help slightly, but I doubt it
because it would have to seek from the data partition to the log
partition and that would probably lose me any savings from the
increased rotational speed.

Any suggestions? Fortunately, this is not an urgent problem, but it
would be nice to understand DB2 well enough to assure myself that this
is about as fast as I should want.

There are lots of things you can do to affect this, such as:

1) Alter your transaction (i.e. autocommit off)
2) LOGBUFSZ (as suggested by Gert)


Tried this: raise LOGBUFSZ from default (8) to 32 and got about 12%
improvement.
3) Consider LOGFILSIZ and LOGPRIMARY
I have not set LOGFILSIZ and I guess the default is 4MBytes.
LOGPRIMARY is 4 right now. 4) Depending on your bufferpools, having the logs on the faster disk
may not be as much of a penalty as you think. Hopefully the data
and index writes are asynchronous.
This machine has 4096 Meg RAM and I could double it if someone sent me
$1200 (I do not expect anyone to do that), so I can have fairly large
buffer pools. I wondered about making the buffer pool for the indices
large enough to hold the entire index so after a startup phase, it would
need to do no IO for SELECTs. But in populating the thing, it seems to
want to keep the disks up to date (makes sense, after all), so it writes
all the time even though it might be able to afford to wait. But when I do
a COMMIT, it does not care to delay very much, and rightly so.

Also, how much faster are the 10,000rpm SCSI disks than the 7,200rpm EIDE
drives? Some, I would suppose. If I had the money, I might put another
Ultra/320 10,000rpm hard drive in there on the other SCSI controller, but
for a hobby database, six hard drives is already a bit ridiculous. I have
fiddled with the extent size (in range of 32 to 64 pages), and stuff like
that, but it does not seem to make much difference.

One thing I notice is that I have set it to run one more page cleaner than
I have hard drives; i.e., 7 page cleaners. When populating the database,
all of them get into D state in Linux (basically waiting for IO). It does
not seem that increasing this number would help much, since more of them
would just contend for the same drives. Is this right?
5) Consider writing to flat files and then using LOAD.
I am afraid that may be the best way but, since the system is now running
about 12x faster (from a user's standpoint) because of a much faster
machine, using DMS instead of SMS storage for most tables, using 4 hard
drives that are not being fought over by other things, and so on, it is
actually fast enough. I just want to find easy ways to tune it. If this
were a serious problem, if I had to do it often, or if I had a paying
customer, I would probably go the route you suggest of writing flat files
that look like the relations I want and use LOAD. But none of these
conditions obtain and I do not want to bother, at least for now.
Good luck
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 18:45:00 up 1 day, 10:20, 5 users, load average: 1.02, 1.31, 3.73

Nov 12 '05 #6
Ensure you're using a SCSI card with an adequate battery-backed write cache.
Put the logs on SCSI disks on the caching drive, along with everything else.
Remove all of the IDE drives from the system. If you still have problems,
upgrade those 10K drives that cause the delays to 15K drives, and/or split
logs over multiple drives.

"Jean-David Beyer" <jd*****@exit109.com> wrote in message
news:10*************@corp.supernews.com...
When initially populating a database, it runs slower that I would have
supposed. Checking here and there, especially with iostat, reveals that
most of the time is spent writting the logfiles. In other words, I am not
compute limited, but I am not IO limited either (in terms of bytes/second
to the drives). The system is in IO-WAIT state most of the time, so I
assume the process is really seek limited.

It is true that my method of populating the database leaves a lot to be
desired. The input is many hundreds of spreadsheet pages with hundreds of
columns and several thousand rows each. I do not want relations with
hundreds of columns, so to normalize them, I read a row at a time and do a
lot of inserts of subsets of the columns into different relations.

When all is done, everything is fine.

The large tables in the database, and the indices are DMS on 10,000rpm
Ultra/320 LVD SCSI hard drives. The logfiles go on a 7,200rpm 100MHz EIDE
hard drive with 8 megabyte cache in the drive. Possibly putting that onto
one of the SCSI drives might help slightly, but I doubt it because it
would have to seek from the data partition to the log partition and that
would probably lose me any savings from the increased rotational speed.

Any suggestions? Fortunately, this is not an urgent problem, but it would
be nice to understand DB2 well enough to assure myself that this is about
as fast as I should want.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 16:10:00 up 1 day, 7:45, 3 users, load average: 1.32, 1.14, 1.04

Nov 12 '05 #7
Mark Yudkin wrote:
Ensure you're using a SCSI card with an adequate battery-backed write cache.
Put the logs on SCSI disks on the caching drive, along with everything else.
Remove all of the IDE drives from the system. If you still have problems,
upgrade those 10K drives that cause the delays to 15K drives, and/or split
logs over multiple drives.


.... or better yet - use battery backed RAM disk as log device ...

Jan M. Nelken
Nov 12 '05 #8
While we're there, check your scsi driver parms.

stuff like :
maxtags
maximumSGlist
numberofrequests

PM
Nov 12 '05 #9
He isn't using SCSI - he's using 7200 rpm EIDE for his log files. As a
result he's spending all his time waiting for disk rotation to occur.

"PM (pm3iinc-nospam) CGO" <PM (pm3iinc-nospam)@cgocable.ca> wrote in message
news:5i****************@charlie.risq.qc.ca...
While we're there, check your scsi driver parms.

stuff like :
maxtags
maximumSGlist
numberofrequests

PM

Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Eric DeWall | last post by:
In trying to clean up the inevitable debug printing littered through some code, I started reading up on the 'logging' module. Browsing groups indicates that the design of the module is...
1
by: jjesso | last post by:
I am trying to add a new logging level. logging.config.fileConfig("bengineLog.cfg") logging.CLIENT = logging.INFO + 1 logging.addLevelName( logging.CLIENT, 'CLIENT' ) logging.root.setLevel( )...
0
by: Karuppasamy | last post by:
H I am trying to use the Logging Module provided by Microsoft Application Blocks for .Net I installed everything as per the Instructions given in the 'Development Using the Logging Block' ...
6
by: pmatos | last post by:
Hi all, I am trying to create a simple but efficient C++ logging class. I know there are lots of them out there but I want something simple and efficient. The number one requirement is the...
23
by: Rotem | last post by:
Hi, while working on something in my current project I have made several improvements to the logging package in Python, two of them are worth mentioning: 1. addition of a logging record field...
6
by: Burkhard Schultheis | last post by:
As I wrote last week, we have a problem with a DB2 V8 on Linux. Here is what is in db2diag.log during online backup: Starting a full database backup. 2004-04-01-02.33.54.760164 ...
2
by: Jonathan | last post by:
Hi All, In my program, i am trying to log all connections to the internet(dialup modem). I want my app to start logging the call as soon as the connection is established(sort of how MSN...
0
by: robert | last post by:
As more and more python packages are starting to use the bloomy (Java-ish) 'logging' module in a mood of responsibility and as I am not overly happy with the current "thickener" style of usage, I...
0
by: rajesh.hanchate | last post by:
Please help me in resolving this issue. I am using EnterpriseLibrary 2.0 Exception and logging block for logging exceptions to event log. It works fine for sometime. After some time it stops...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.