By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,903 Members | 1,086 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,903 IT Pros & Developers. It's quick & easy.

Recomended FS

P: n/a
Hi

I'm upgrading the DB sever hardware and also the Linux OS.

My Questions are:

1. What is the preferred FS to go with ? EXT3, Reiseref, JFS, XFS ? ( speed,
efficiency )
2. What is the most importent part in the Hardware ? fast HD, alot of mem,
or maybe strong cpu ?

Thanks in Advance

--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #1
Share this Question
Share on Google+
55 Replies


P: n/a
Ben-Nes Michael wrote:
Hi

I'm upgrading the DB sever hardware and also the Linux OS.

My Questions are:

1. What is the preferred FS to go with ? EXT3, Reiseref, JFS, XFS ? ( speed,
efficiency )
Thats a flamebait. People never agree due to their experiences. Besides that
depends upon what kind of database you are dealing with.

Best bet is benchmark for your own app. Reiser/XFS/JFS are all good. Ext3
requires selection of proper mode. Its almost equally good. You decide what
works best for you..
2. What is the most importent part in the Hardware ? fast HD, alot of mem,
or maybe strong cpu ?


A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
than IDE. So does hardware SCSI RAID.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #2

P: n/a


On Mon, 20 Oct 2003, Shridhar Daithankar wrote:

A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
than IDE. So does hardware SCSI RAID.

I hate asking this again. But WHY?

What SCSI gain in spinning at 15000RPM and larger buffers. They
lose in Space, and a slower bus. I would like to see some profe. Sorry.

IDE Hard Disk 40Gb 7200RPM = 133Mbs = 50UKP
SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP

Is that extra 27Mbs worth another IDE Disk. and while you can get
bigger faster SCSI disks prices go through the roof. Its no longer RAID
but RAED (Redundant Array of Expensive Disks)

My advise not that I've got any proof is that the money is better
spent on a good disk controller and many disks than on each disk.

In short if you have money to burn then by all means get SCSI but
most people are better of spending

$200 Disk Controller $200 Disk Controller
$100 40Gb Disks Than $200 40Gb Disk

Prices only approx.

Peter Childs

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #3

P: n/a
Peter Childs wrote:

On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
than IDE. So does hardware SCSI RAID.


I hate asking this again. But WHY?


OK.. There are only few SCSI disks that I have handled so take it with grain of
salt.

1. SCSI bus can share bandwidth much better than IDE disks. Put two IDE disks on
same channel and two SCSI disks. See which combo performs better.
2. <Unconfirmed> SCSI disks are idividually tested and IDEs are sampled. Makes a
big difference in reliability. I know for some people IDE disks do not crash at
all but majority think SCSI are more reliable than IDEs.
3. SCSI disks have Tag commands and things alike, that makes them better at
handling load.

Technically, if you don't know the load, SCSI would make a better choice. If
you know your load very well and it is predictive, IDE might be a choice.

I would personally prefer IDE disk array with hardware RAID controller because I
can put it in my home machine, unlike SCSI. But every developer I have asked
around here, says that IDE performance starts dropping once you hit real world load.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #4

P: n/a
Peter Childs wrote:

On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
than IDE. So does hardware SCSI RAID.

I hate asking this again. But WHY?


The duty cycle of SCSI drives is 100%. The duty cycle of IDE drives is
around 30-40%. Therefore one uses SCSI drives in mail and news servers
where disk access is more-or-less permanent. IDE drives usually degrade
or fail faster under such load.

From experience I have noticed that IDE drives that initially perform
at 30Mbyte/sec dropped to around 10Mbyte/sec after a year or so.
What SCSI gain in spinning at 15000RPM and larger buffers. They
lose in Space, and a slower bus. I would like to see some profe. Sorry.

IDE Hard Disk 40Gb 7200RPM = 133Mbs = 50UKP
SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP
On new servers doing a software RAID1 sync between two disks, I find the
following sustained transfer rates:

SuperMicro 6013P-i ATA 133 80Gb IDE 7200rpm: 39000kbytes/sec.
SuperMicro 6013P-8 SCSI 320 72Gb SCSI 10000rpm: 65000kbytes/sec.

The IDE drives are on seperate busses. The SCSI drives are on the same bus.

I think that the 320Mhz SCSI busses are a bit faster than the 133Mhz ATA
busses.

Is that extra 27Mbs worth another IDE Disk. and while you can get
bigger faster SCSI disks prices go through the roof. Its no longer RAID
but RAED (Redundant Array of Expensive Disks)

My advise not that I've got any proof is that the money is better
spent on a good disk controller and many disks than on each disk.

In short if you have money to burn then by all means get SCSI but
most people are better of spending


I suppose that's your choice. Another way of looking that things is to
consider the worth the server has to your business and factor that into
how much you should consider spending on equipment.

e.g. if the server can be attributed to £10,000/year, then perhaps a
cheap PC will do. If £1 million of your business relies on the server,
then perhaps you should look into investing more into it.
Regards,
Nick.
--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #5

P: n/a


On Mon, 20 Oct 2003, Nick Burrett wrote:
Peter Childs wrote:

On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
than IDE. So does hardware SCSI RAID.


I hate asking this again. But WHY?


The duty cycle of SCSI drives is 100%. The duty cycle of IDE drives is
around 30-40%. Therefore one uses SCSI drives in mail and news servers
where disk access is more-or-less permanent. IDE drives usually degrade
or fail faster under such load.

From experience I have noticed that IDE drives that initially perform
at 30Mbyte/sec dropped to around 10Mbyte/sec after a year or so.
What SCSI gain in spinning at 15000RPM and larger buffers. They
lose in Space, and a slower bus. I would like to see some profe. Sorry.

IDE Hard Disk 40Gb 7200RPM = 133Mbs = 50UKP
SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP


On new servers doing a software RAID1 sync between two disks, I find the
following sustained transfer rates:

SuperMicro 6013P-i ATA 133 80Gb IDE 7200rpm: 39000kbytes/sec.
SuperMicro 6013P-8 SCSI 320 72Gb SCSI 10000rpm: 65000kbytes/sec.

The IDE drives are on seperate busses. The SCSI drives are on the same bus.

I think that the 320Mhz SCSI busses are a bit faster than the 133Mhz ATA
busses.

Is that extra 27Mbs worth another IDE Disk. and while you can get
bigger faster SCSI disks prices go through the roof. Its no longer RAID
but RAED (Redundant Array of Expensive Disks)

My advise not that I've got any proof is that the money is better
spent on a good disk controller and many disks than on each disk.

In short if you have money to burn then by all means get SCSI but
most people are better of spending


I suppose that's your choice. Another way of looking that things is to
consider the worth the server has to your business and factor that into
how much you should consider spending on equipment.

e.g. if the server can be attributed to £10,000/year, then perhaps a
cheap PC will do. If £1 million of your business relies on the server,
then perhaps you should look into investing more into it.

At last somone who has the real answers that I thought ought to be
true all the time. Its a shame nobody can give some hard and fast numbers
that I can get the budget people to understand!

Peter Childs

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #6

P: n/a
I'm not a HD specialist but I know scsi can handle load much better the IDE.

I read a benchmark lately ( don't really remember where ) checking SATA
against U160, the result show that SATA give better performance at start.

but later on the SCSI take it while HD cpu load is 30% and the SATA is 100%
load for the same task.

So I see its kinda obvious for me, if its a server serve lots of files and
the HD will work against lots of users ill go for the SCSI.
For a workstation or backup server ill go for IDE.

But still the greatest question is what FS to put on ?

I heard Reiesref can handle small files very quickly.
--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------
----- Original Message -----
From: "Peter Childs" <bl*********@blueyonder.co.uk>
To: "Shridhar Daithankar" <sh*****************@persistent.co.in>
Cc: "Ben-Nes Michael" <mi**@canaan.co.il>; "postgresql"
<pg***********@postgresql.org>
Sent: Monday, October 20, 2003 11:51 AM
Subject: Re: [GENERAL] Recomended FS



On Mon, 20 Oct 2003, Shridhar Daithankar wrote:

A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy than IDE. So does hardware SCSI RAID.

I hate asking this again. But WHY?

What SCSI gain in spinning at 15000RPM and larger buffers. They
lose in Space, and a slower bus. I would like to see some profe. Sorry.

IDE Hard Disk 40Gb 7200RPM = 133Mbs = 50UKP
SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP

Is that extra 27Mbs worth another IDE Disk. and while you can get
bigger faster SCSI disks prices go through the roof. Its no longer RAID
but RAED (Redundant Array of Expensive Disks)

My advise not that I've got any proof is that the money is better
spent on a good disk controller and many disks than on each disk.

In short if you have money to burn then by all means get SCSI but
most people are better of spending

$200 Disk Controller $200 Disk Controller
$100 40Gb Disks Than $200 40Gb Disk

Prices only approx.

Peter Childs

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #7

P: n/a
Quoth mi**@canaan.co.il ("Ben-Nes Michael"):
I'm not a HD specialist but I know scsi can handle load much better the IDE.

I read a benchmark lately ( don't really remember where ) checking SATA
against U160, the result show that SATA give better performance at start.
but later on the SCSI take it while HD cpu load is 30% and the SATA is 100%
load for the same task.

So I see its kinda obvious for me, if its a server serve lots of files and
the HD will work against lots of users ill go for the SCSI.
For a workstation or backup server ill go for IDE.

But still the greatest question is what FS to put on ?

I heard Reiesref can handle small files very quickly.


ReiserFS was designed to cope with having huge hordes of tiny files.
PostgreSQL doesn't create files in that pattern; it only creates
fairly large files, and that tends to be the pathological case where
ReiserFS works somewhat badly.

When I ran some transaction-heavy benchmarks between ext3, XFS, and
JFS, I found JFS to be pretty consistently faster. I didn't bother
trying reiserfs because:
a) It has a history of being slower for big files;
b) I have had some cases of losing data to it, diminishing my trust
of it.
--
output = ("cbbrowne" "@" "ntlug.org")
http://www.ntlug.org/~cbbrowne/unix.html
"sic transit discus mundi"
-- From the System Administrator's Guide, by Lars Wirzenius
Nov 12 '05 #8

P: n/a
On Mon, 20 Oct 2003 11:07:20 +0100
Nick Burrett <ni**@dsvr.net> wrote:

From experience I have noticed that IDE drives that initially perform

at 30Mbyte/sec dropped to around 10Mbyte/sec after a year or so.


Yes. This is very true - a good test I like to show of IDE falling apart
is to start up one client and show it go very fast. Then start up 20
and see what happens :)

Also - you can easily have many, many more scsi devices (and external
scsi devices) than IDE. More platters / disks == faster IO.


IDE Hard Disk 40Gb 7200RPM = 133Mbs = 50UKP
SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP


If you don't mind refurb disks that still have a warranty, check out
ebay. Friday I won a lot of 10 18GB disks for $96 + $27
insured shipping. But yeah, new scsi is quite expensive, but it can be
worth it... IMHO scsi is to be used in a raid, not alone. No one disk
can saturate the bw offered. (both ide and scsi).
--
Jeff Trout <je**@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #9

P: n/a
Ben-Nes Michael wrote:
But still the greatest question is what FS to put on ?

I heard Reiesref can handle small files very quickly.


Switching from ext3 to reiserfs for our name servers reduced the time
taken to load 110,000 zones from 45 minutes to 5 minutes.

However for a database, I don't think you can really factor this type of
stuff into the equation. The performance benefits you get from
different filesystem types are going to be small compared to the
modifications that you can make to your database structure, queries and
applications. The actual algorithms used in processing the data will be
much slower than the time taken to fetch the data off disk.

--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #10

P: n/a
----- Original Message -----
From: "Nick Burrett" <ni**@dsvr.net>
To: "Ben-Nes Michael" <mi**@canaan.co.il>
Cc: "Peter Childs" <bl*********@blueyonder.co.uk>; "Shridhar Daithankar"
<sh*****************@persistent.co.in>; "postgresql"
<pg***********@postgresql.org>
Sent: Monday, October 20, 2003 2:08 PM
Subject: Re: [GENERAL] Recomended FS

Ben-Nes Michael wrote:
But still the greatest question is what FS to put on ?

I heard Reiesref can handle small files very quickly.
Switching from ext3 to reiserfs for our name servers reduced the time
taken to load 110,000 zones from 45 minutes to 5 minutes.

However for a database, I don't think you can really factor this type of
stuff into the equation. The performance benefits you get from
different filesystem types are going to be small compared to the
modifications that you can make to your database structure, queries and
applications. The actual algorithms used in processing the data will be
much slower than the time taken to fetch the data off disk.


So you say the FS has no real speed impact on the SB ?

In my pg data folder i have 2367 files, some big some small.
--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #11

P: n/a
Ben-Nes Michael wrote:
----- Original Message -----
From: "Nick Burrett" <ni**@dsvr.net>
Ben-Nes Michael wrote:

But still the greatest question is what FS to put on ?

I heard Reiesref can handle small files very quickly.


Switching from ext3 to reiserfs for our name servers reduced the time
taken to load 110,000 zones from 45 minutes to 5 minutes.

However for a database, I don't think you can really factor this type of
stuff into the equation. The performance benefits you get from
different filesystem types are going to be small compared to the
modifications that you can make to your database structure, queries and
applications. The actual algorithms used in processing the data will be
much slower than the time taken to fetch the data off disk.

So you say the FS has no real speed impact on the SB ?

In my pg data folder i have 2367 files, some big some small.


I'm saying: don't expect your DB performance to come on leaps and bounds
just because you changed to a different filesystem format. If you've
got speed problems then it might help to look elsewhere first.

--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #12

P: n/a
----- Original Message -----
From: "Nick Burrett" <ni**@dsvr.net>
To: "Ben-Nes Michael" <mi**@canaan.co.il>
Cc: "postgresql" <pg***********@postgresql.org>
Sent: Monday, October 20, 2003 2:54 PM
Subject: Re: [GENERAL] Recomended FS
But still the greatest question is what FS to put on ?

I heard Reiesref can handle small files very quickly.

Switching from ext3 to reiserfs for our name servers reduced the time
taken to load 110,000 zones from 45 minutes to 5 minutes.

However for a database, I don't think you can really factor this type of
stuff into the equation. The performance benefits you get from
different filesystem types are going to be small compared to the
modifications that you can make to your database structure, queries and
applications. The actual algorithms used in processing the data will be
much slower than the time taken to fetch the data off disk.

So you say the FS has no real speed impact on the SB ?

In my pg data folder i have 2367 files, some big some small.


I'm saying: don't expect your DB performance to come on leaps and bounds
just because you changed to a different filesystem format. If you've
got speed problems then it might help to look elsewhere first.

I dont expect miracles :)
but still i have to choose one,so why shouldnt i choose the one which best
fit ?
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #13

P: n/a
Ben-Nes Michael wrote:
I'm saying: don't expect your DB performance to come on leaps and bounds
just because you changed to a different filesystem format. If you've
got speed problems then it might help to look elsewhere first.


I dont expect miracles :)
but still i have to choose one,so why shouldnt i choose the one which best
fit ?


All things being equal, you should optimise your application design and database
tuning before you choose file system.

If a thing works well for you, with a better file system it will just work
better. That's the point.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #14

P: n/a
----- Original Message -----
From: "Shridhar Daithankar" <sh*****************@persistent.co.in>
To: "Ben-Nes Michael" <mi**@canaan.co.il>
Cc: "Nick Burrett" <ni**@dsvr.net>; "postgresql"
<pg***********@postgresql.org>
Sent: Monday, October 20, 2003 3:06 PM
Subject: Re: [GENERAL] Recomended FS

Ben-Nes Michael wrote:
I'm saying: don't expect your DB performance to come on leaps and bounds
just because you changed to a different filesystem format. If you've
got speed problems then it might help to look elsewhere first.

I dont expect miracles :)
but still i have to choose one,so why shouldnt i choose the one which best fit ?


All things being equal, you should optimise your application design and

database tuning before you choose file system.

If a thing works well for you, with a better file system it will just work
better. That's the point.

I agree, but still ill have to choose an FS, does the list have any opinion
on what FS to choose ?
Shridhar

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #15

P: n/a
> Peter Childs wrote:

On Mon, 20 Oct 2003, Shridhar Daithankar wrote:

A fast HD with a good RAID controller. Subject to budget, SCSI are
beter buy than IDE. So does hardware SCSI RAID.
I hate asking this again. But WHY?

What SCSI gain in spinning at 15000RPM and larger
buffers. They lose in Space, and a slower bus. I would like
to see some profe. Sorry.

They win it, easily, on random disk accesses and mixed reads and writes.
And the bus is, much, faster not slower.
IDE Hard Disk 40Gb 7200RPM = 133Mbs = 50UKP
SCSI Hard Disk 36Gb 10000RPM = 160Mbs = 110UKP

Is that extra 27Mbs worth another IDE Disk. and while
you can get bigger faster SCSI disks prices go through the
roof. Its no longer RAID but RAED (Redundant Array of Expensive Disks) You're looking at the BUS speed, not the actual speed the disk achieves.
My guess is that that SCSI disk is, on some fields, twice as fast as the
IDE and on average 10-30% faster.
My advise not that I've got any proof is that the money
is better spent on a good disk controller and many disks than
on each disk. This havily depends on your setup and tasks.

- SCSI has a (supposedly) better lifetime, due to (much) better disk
components.
- SCSI disks are designed for servertasks (many random accesses) and
have their queue-management (better) tuned for that. This also applies
to mixed reads and writes.
- SCSI disks have, often, smaller and thicker platters which can spin
more stable and at higher RPMs.
- The SCSI bus allows all the disks to operate at maximum speed (as far
as the PCI-bus can handle it of course), while the IDE bus is shared
among both disks.
- SCSI allows more disks and longer cables on the same controller.

Anyway, you don't need all those advantages all the time, since the
major disadvantage is of course the pricetag.
For simple backup solutions (many storage for with reasonable
performance and an acceptable price), IDE is quite good in RAID5 orso.
For a high performing Database, you really want to look into a RAID
setup with scsi (or at least WD Raptor IDE disks or something like
that).
In short if you have money to burn then by all means
get SCSI but most people are better of spending

Also if you don't have money to burn, but simply need the higher
performance (which is really there) for, for instance, the random disk
accesses.

Best regards,

Arjen


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #16

P: n/a
On Mon, 20 Oct 2003, Peter Childs wrote:


On Mon, 20 Oct 2003, Shridhar Daithankar wrote:

A fast HD with a good RAID controller. Subject to budget, SCSI are beter buy
than IDE. So does hardware SCSI RAID.

I hate asking this again. But WHY?

What SCSI gain in spinning at 15000RPM and larger buffers. They
lose in Space, and a slower bus. I would like to see some profe. Sorry.


SCSI beats IDE hands down for databases, and for one reason above all the
rest. They don't generally lie about fsync.

With SCSI, you can initiate 'pgbench -c 100 -t 1000000' and pull the plug
on your server, and voila, the whole thing will come back up (assuming a
journaling file system, and solid hardware.)

Do that with IDE with write cache enabled and you WILL have a scrambled
database that needs to be re-initdbed and restored.

Now, turn off the write cache on the IDE drive, which will make it solid
and reliable like the SCSI drive, and compare speed, it's not even close.

Until the IDE drive manufacturers start making IDE drives that actually
report fsync properly, they're a toy that should not be used for your
database unless you know the dangers they present.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #17

P: n/a
Ben-Nes Michael writes:
1. What is the preferred FS to go with ? EXT3, Reiseref, JFS, XFS ? ( speed,
efficiency )
PostgreSQL might work better on "simple" file systems, so you avoid making
the head run all over the place for writing its own log and the PostgreSQL
log. Some have even suggested FAT for the data files. Good bets for
improving performance are putting the WAL logs and the indexes not on the
same spindle as the table files. Of course, certain RAID configurations
achieve a similar effect.
2. What is the most importent part in the Hardware ? fast HD, alot of mem,
or maybe strong cpu ?


Lots of memory, so you can cache a large fraction of the data in memory.
A good hard disk, if you do a lot of updates and/or your memory is not big
enough to cache most of the data. CPU is not as important.

--
Peter Eisentraut pe*****@gmx.net
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #18

P: n/a
Hi Ben,

You asked so here's my take on the subject, but I've gotta say that you
can't go far wrong with reading Bruce Momjian's paper at:

http://www.ca.postgresql.org/docs/mo...w_performance/

But with that aside.

1. Unless your doing major league DB stuff, the FS should make more than
marginal difference, if it's Journaled then it's good. You can take all
the time benchmarking that you want, just be sure your ROI is worth the
time you invest. My favourite fs is Reiser, but in the cold light of
day, ext3 is supported in more places. My first choice is Reiser, since
I used it even when it was "unstable" on production servers and it never
let me down. I often use one or the other.

2. Bruce's article really is good for this question, but in a nutshell
you need to get as much of the DB as close to the CPU as possible. As
with any serious application, you can't beat a good L1/L2 cache, then
plenty of RAM/Memory ... DBs yum RAM, the more the merrier. Lastly fast
and wide disc access, remember disk access will be the slowest part of
the system, and in an ideal world you'd fit nearly all of your DB in RAM
if it was practical and safe.

You'd probably gain more from taking the time to really ensure that your
DB is designed flawlessly, and all your indexes are where they're
needed. All of the basics come into play, but a well built RDBMS system
is greater than the sum of its parts.

For further reading check out:

http://www.argudo.org/postgresql/soft-tuning.html

It all adds up!!.

Good Luck

Tony.

Ben-Nes Michael wrote:
Hi

I'm upgrading the DB sever hardware and also the Linux OS.

My Questions are:

1. What is the preferred FS to go with ? EXT3, Reiseref, JFS, XFS ? ( speed,
efficiency )
2. What is the most importent part in the Hardware ? fast HD, alot of mem,
or maybe strong cpu ?

Thanks in Advance

--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


Nov 12 '05 #19

P: n/a
On Monday 20 October 2003 10:28 am, scott.marlowe wrote:
On Mon, 20 Oct 2003, Peter Childs wrote:
On Mon, 20 Oct 2003, Shridhar Daithankar wrote:
A fast HD with a good RAID controller. Subject to budget, SCSI
are beter buy than IDE. So does hardware SCSI RAID.


I hate asking this again. But WHY?

What SCSI gain in spinning at 15000RPM and larger buffers. They
lose in Space, and a slower bus. I would like to see some profe.
Sorry.


SCSI beats IDE hands down for databases, and for one reason above
all the rest. They don't generally lie about fsync.
....


Talk about timing...this article posted today seems quite apropos
(spoiler: SCSI beats IDE):

http://hardware.devchannel.org/hardw...&tid=38&tid=49

Cheers,
Steve
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #20

P: n/a
On Mon, Oct 20, 2003 at 08:09:34AM -0400 I heard the voice of
Jeff, and lo! it spake thus:

insured shipping. But yeah, new scsi is quite expensive, but it can be
worth it... IMHO scsi is to be used in a raid, not alone. No one disk
can saturate the bw offered. (both ide and scsi).


The difference is that IDE *HAS* to be able to saturate the bus (which it
can't, of course; show me an IDE drive that pushes even 66MB/sec off the
platter) for the bus speed to matter, since IDE doesn't support
disconnection. Multiple SCSI drives can be stuffing data over the SCSI
channel all at once. They don't have to be RAID'd, they can be different
filesystems accessed in parallel.
--
Matthew Fuller (MF4839) | fu******@over-yonder.net
Systems/Network Administrator | http://www.over-yonder.net/~fullermd/

"The only reason I'm burning my candle at both ends, is because I
haven't figured out how to light the middle yet"

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #21

P: n/a
Some sort of ATA Raid is probably worth considering -

e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
Promise TX2000

The disks themselves give fairly poor performance when attached to the
std IDE channels :

sequential write 15Mb/s
sequential read 20Mb/s

But attached to the Promise card using RAID 0 do considerably better:

sequential write 52Mb/s
sequential read 52MB/s

Now you would probably not use RAID 0 for a "real" system (unless you
had good backups), but the difference is interesting

Note that even including the card, this is a very cheap setup.

(I have not gotten around to testing random read and writes, but if
anybody is interested I can test this and supply figures)

regards

Mark
Steve Crawford wrote:


Talk about timing...this article posted today seems quite apropos
(spoiler: SCSI beats IDE):

http://hardware.devchannel.org/hardw...&tid=38&tid=49

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #22

P: n/a
On Tuesday 21 October 2003 11:26, Mark Kirkwood wrote:
(I have not gotten around to testing random read and writes, but if
anybody is interested I can test this and supply figures)


Can you compare ogbench results for the RAID and single IDE disks? It would be
great if you could turn off write caching of individual drives in RAID and
test it as well.

I think for lot of databases IDE RAID could be a good compramise. Just
remember its not the best out there. So use it when you have good
backups..:-)

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #23

P: n/a
On Tue, Oct 21, 2003 at 06:56:52PM +1300, Mark Kirkwood wrote:
Some sort of ATA Raid is probably worth considering -

e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
Promise TX2000


We had some reasonably good luck with RAID on a 2-way Promise card,
but multi-disk ATA RAID has been a great disappointment. If I were
doing it again, I'd buy 2 or 3 ATA controllers and do the RAID in
software.

That said, even the 2-way RAID became almost uselessly slow when
multiple queries were running -- indeed, dramatically slower than a
plain single IDE drive. This is not at all the experience we have
with SCSI, so either the IDE RAID people haven't worked it all out,
or (more likely IMHO) there are limitations in IDE which make it
ill-suited to the access patterns of a database under multiple
simultaneous (divergent) queries.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Afilias Canada Toronto, Ontario Canada
<an****@libertyrms.info> M2P 2A8
+1 416 646 3304 x110
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #24

P: n/a
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 21 Oct 2003 18:56:52 +1300, Mark Kirkwood wrote:

Note that even including the card, this is a very cheap setup.

Yes, this is the single advantage of IDE vs SCSI. If the price of the storage system
is the *only* consideration, IDE is the way to go.
SCSI has a long history of providing sustained throughput for server systems.
IDE has a short history of providing very cheap storage for desktops.
- --
jimoe at sohnen-moe dot com
pgp/gpg public key: http://www.keyserver.net/en/
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0 OS/2 for non-commercial use
Comment: PGP 5.0 for OS/2
Charset: cp850

wj8DBQE/lT/NsxxMki0foKoRAvQMAKDKcQipqioww6aVc+kbCXUAdLtLUwCfe 2Wo
Zdkcklqi45qBXpRsznne3QE=
=H19u
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #25

P: n/a
On Tue, 21 Oct 2003, Mark Kirkwood wrote:
Some sort of ATA Raid is probably worth considering -

e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
Promise TX2000

The disks themselves give fairly poor performance when attached to the
std IDE channels :

sequential write 15Mb/s
sequential read 20Mb/s

But attached to the Promise card using RAID 0 do considerably better:

sequential write 52Mb/s
sequential read 52MB/s

Now you would probably not use RAID 0 for a "real" system (unless you
had good backups), but the difference is interesting

Note that even including the card, this is a very cheap setup.

(I have not gotten around to testing random read and writes, but if
anybody is interested I can test this and supply figures)


OK, but here's the real test. As the postgres user, run 'pgbench -i',
then after that runs, run 'pgbench -c 50 -t 1000000'. While it's running
and settled (pg aux|grep postgres|wc -l should show a number of ~54 or
so.) pull the plug. Wait for the hard drives to spin down, then plug it
back in and power it one. With SCSI you will still have a coherent
database.

If you want a coherent database on IDE drives under postgresql you will
need to issue this command: 'hdparm -W0 /dev/hdx' where x is the letter of
the drives under the RAID array to turn off write caching. This will slow
them to a crawl on writes.

And there's plenty of uses for RAID 0 in real systems, just not generally
in real 24/7 systems. But for high speed batchs that might take a week to
run on a RAID5 but run in an hour on RAID0, that would be an acceptable
risk. Think of machines that read in all their data off of a NAS, crunch
it, and dump it back out in flat files when they're done.

For things like that IDE drives and RAID 0 make a nice fit. But don't put
the payroll on them. :-)
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #26

P: n/a
On Tue, Oct 21, 2003 at 06:56:52PM +1300, Mark Kirkwood wrote:
Some sort of ATA Raid is probably worth considering -

e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
Promise TX2000
...
But attached to the Promise card using RAID 0 do considerably
better:

sequential write 52Mb/s
sequential read 52MB/s
...
Note that even including the card, this is a very cheap setup.


You may also want to consider the 3Ware IDE raid cards
(www.3ware.com). Unlike the Promise card, they are full hardware
RAID with onboard CPU's to handle all the RAID work and offload that
from your main CPU in your PC. They are a bit more expensive than
the Promise offerings, but when you consider than the larger cards do
RAID0/1/5 totally in hardware on the card, the price difference is
not so great afterall.

Some of their (3Ware's) larger cards allow you to attach up to 12 IDE
disks to the card as well as giving you hot swap capability.
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #27

P: n/a
Hello,

Actually if you were to get off that Promise controller and on to a
3Ware or other "real" hardware RAID... you would probably
see even better performance.

Sincerely,

Joshua Drake
Mark Kirkwood wrote:
Some sort of ATA Raid is probably worth considering -

e.g. I am experimenting with a system using 2 ATA-66 Seagates + 1
Promise TX2000

The disks themselves give fairly poor performance when attached to the
std IDE channels :

sequential write 15Mb/s
sequential read 20Mb/s

But attached to the Promise card using RAID 0 do considerably better:

sequential write 52Mb/s
sequential read 52MB/s

Now you would probably not use RAID 0 for a "real" system (unless you
had good backups), but the difference is interesting

Note that even including the card, this is a very cheap setup.

(I have not gotten around to testing random read and writes, but if
anybody is interested I can test this and supply figures)

regards

Mark
Steve Crawford wrote:


Talk about timing...this article posted today seems quite apropos
(spoiler: SCSI beats IDE):

http://hardware.devchannel.org/hardw...&tid=38&tid=49


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #28

P: n/a
re*****@yahoo.com (Richard Ellis) wrote:
Some of their (3Ware's) larger cards allow you to attach up to 12 IDE
disks to the card as well as giving you hot swap capability.


This is all well and good, but may not sufficiently cover over the
Vital Problem with IDE drives, namely that they are likely to cache
writes and not tell the 3Ware controller about that.

It would doubtless be a slick thing to have an IDE RAID controller
with cache (that might well overcome some of the traditional problems
with IDE), but that only forcibly helps if you can turn off write
cacheing on the drives.
--
output = reverse("moc.enworbbc" "@" "enworbbc")
http://cbbrowne.com/info/advocacy.html
"What this list needs is a good five-dollar plasma weapon."
--paraphrased from `/usr/bin/fortune`
Nov 12 '05 #29

P: n/a
Yes indeed - have come to that conclusion too (see other email)

Joshua D. Drake wrote:
Hello,

Actually if you were to get off that Promise controller and on to a
3Ware or other "real" hardware RAID... you would probably
see even better performance.

Sincerely,

Joshua Drake

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #30

P: n/a
I have found this as well -

I have a nice simple example of a program that loops and occasionally
writes a block to a file.
On a 2 cpu machine, running 2 of these processes in parallel takes twice
as long as running just 1 process!
However if I comment out the IO, then 2 processes takes the same elapsed
time as 1.

My conclusion is there exists some sort of "big" lock on access to the
ATA array.

I believe that 3ware have a non blocking implementation of ATA RAID -
I intend to sell the Promise and obtain a 3ware in the next month of so
and test this out.

regards

Mark

Andrew Sullivan wrote:
That said, even the 2-way RAID became almost uselessly slow when
multiple queries were running -- indeed, dramatically slower than a
plain single IDE drive.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #31

P: n/a

scott.marlowe wrote:

OK, but here's the real test. As the postgres user, run 'pgbench -i',
then after that runs, run 'pgbench -c 50 -t 1000000'. While it's running
and settled (pg aux|grep postgres|wc -l should show a number of ~54 or
so.) pull the plug. Wait for the hard drives to spin down, then plug it
back in and power it one. With SCSI you will still have a coherent
database.

Agreed in principle - pgbench is the most interesting test... for this
mailing list anyway :-).
However s = 1 makes a tiny database that fits into the file buffer cache
on most machines, which is not a very realistic situation.

e.g. the Dell gets tps = 250 for s = 1 c = 5 t = 1000. This number
looks great but its not too much to do with IO....

I am happier about s = 10 - 50 for machines with 512+ Mb of RAM.

From memory the Dell gets tps = 36 for s = 10 c = 5 t = 100000. This
result seems more believable!

If you want a coherent database on IDE drives under postgresql you will
need to issue this command: 'hdparm -W0 /dev/hdx' where x is the letter of
the drives under the RAID array to turn off write caching. This will slow
them to a crawl on writes.

I should have said that I was using Freebsd 4.8 with write caching off.
The question of whether the disk *actually* turned it off is the
significant issue, so yes, "use with care" should preface any comments
about IDE usage!

best wishes

Mark
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #32

P: n/a
I believe that 3ware have a non blocking implementation of ATA RAID -
I intend to sell the Promise and obtain a 3ware in the next month of
so and test this out.

I use 3Ware exclusively for my ATA-RAID solutions. The nice thing about
them is that
they are REAL hardware RAID and the use the SCSI layer within Linux so
you address
them as a standard SCSI device.

Also their support is in the kernel... no wierd, experimental patching.

On a Dual 2000 Athlon MP I was able to sustain 50MB/sec over large
copys (4+ gigs). Very, Very happy with them.

Sincerely,

Joshua Drake

regards

Mark

Andrew Sullivan wrote:
That said, even the 2-way RAID became almost uselessly slow when
multiple queries were running -- indeed, dramatically slower than a
plain single IDE drive.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #33

P: n/a


Mark Kirkwood wrote:
I should have said that I was using Freebsd 4.8 with write caching off.


write caching *on* - I got myself confused about what the value "1"
means....
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #34

P: n/a
On Thu, 23 Oct 2003, Mark Kirkwood wrote:

scott.marlowe wrote:

OK, but here's the real test. As the postgres user, run 'pgbench -i',
then after that runs, run 'pgbench -c 50 -t 1000000'. While it's running
and settled (pg aux|grep postgres|wc -l should show a number of ~54 or
so.) pull the plug. Wait for the hard drives to spin down, then plug it
back in and power it one. With SCSI you will still have a coherent
database.

Agreed in principle - pgbench is the most interesting test... for this
mailing list anyway :-).
However s = 1 makes a tiny database that fits into the file buffer cache
on most machines, which is not a very realistic situation.

e.g. the Dell gets tps = 250 for s = 1 c = 5 t = 1000. This number
looks great but its not too much to do with IO....

I am happier about s = 10 - 50 for machines with 512+ Mb of RAM.

From memory the Dell gets tps = 36 for s = 10 c = 5 t = 100000. This
result seems more believable!


You missed my point there. I wasn't CARING what kind of numbers you get
back at all. My point was that if you place the database under fairly
high transactional load, and pull the plug, is the database still coherent
when it comes back up.

I generally test with -s10 through -s50, but for this test it makes no
difference I can see, i.e. if the thing is gonna get scrammed at -s50,
it'll get scrammed at -s1 as well, and take less time to test.
If you want a coherent database on IDE drives under postgresql you will
need to issue this command: 'hdparm -W0 /dev/hdx' where x is the letter of
the drives under the RAID array to turn off write caching. This will slow
them to a crawl on writes.

I should have said that I was using Freebsd 4.8 with write caching off.
The question of whether the disk *actually* turned it off is the
significant issue, so yes, "use with care" should preface any comments
about IDE usage!


-- NOTE in a correction Mark stated that caching was on, not off --

Assuming that the caching was on, I'm betting your database won't survive
a power plug pull in the middle of transactions like the test I put up
above.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #35

P: n/a
On Wed, 22 Oct 2003, Joshua D. Drake wrote:
I believe that 3ware have a non blocking implementation of ATA RAID -
I intend to sell the Promise and obtain a 3ware in the next month of
so and test this out.

I use 3Ware exclusively for my ATA-RAID solutions. The nice thing about
them is that
they are REAL hardware RAID and the use the SCSI layer within Linux so
you address
them as a standard SCSI device.

Also their support is in the kernel... no wierd, experimental patching.

On a Dual 2000 Athlon MP I was able to sustain 50MB/sec over large
copys (4+ gigs). Very, Very happy with them.


Do they survive the power plug pulling test I was talking about elsewhere
in this thread?
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #36

P: n/a
Its worth checking - isn't it ?

I appeciate that you may have performed such tests previously - but as
hardware and software evolve its often worth repeating such tests (goes
away to do the suggested one tonight).

Note that I am not trying to argue away the issue about write caching -
it *has* to increase the risk of database corruption following a power
failure, however if your backups are regular and reliable this may be a
risk worth taking to achieve acceptable performance at a low price.

regards

Mark
scott.marlowe wrote:

Assuming that the caching was on, I'm betting your database won't survive
a power plug pull in the middle of transactions like the test I put up
above.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #37

P: n/a
Mark Kirkwood wrote:
Its worth checking - isn't it ?

I appeciate that you may have performed such tests previously - but as
hardware and software evolve its often worth repeating such tests (goes
away to do the suggested one tonight).

Note that I am not trying to argue away the issue about write caching -
it *has* to increase the risk of database corruption following a power
failure, however if your backups are regular and reliable this may be a
risk worth taking to achieve acceptable performance at a low price.


Sure, but how many people are taking that risk and not knowing it!

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #38

P: n/a
I suspect almost everyone using IDE drives -

We the "consumers" of this technology need to demand that the vendors:

1. Be honest about these limitations / bugs
2. Work to fix obvious bugs - e.g. drives lying about write cache status
need to have their behaviour changed as soon as possible.

In the meantime I guess all we can do is try to understand the issue and
raise awareness

regards

Mark

Bruce Momjian wrote:
Mark Kirkwood wrote:

Its worth checking - isn't it ?

I appeciate that you may have performed such tests previously - but as
hardware and software evolve its often worth repeating such tests (goes
away to do the suggested one tonight).

Note that I am not trying to argue away the issue about write caching -
it *has* to increase the risk of database corruption following a power
failure, however if your backups are regular and reliable this may be a
risk worth taking to achieve acceptable performance at a low price.


Sure, but how many people are taking that risk and not knowing it!

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #39

P: n/a
On Mon, 20 Oct 2003, Ben-Nes Michael wrote:
----- Original Message -----
From: "Nick Burrett" <ni**@dsvr.net>
To: "Ben-Nes Michael" <mi**@canaan.co.il>
Cc: "postgresql" <pg***********@postgresql.org>
Sent: Monday, October 20, 2003 2:54 PM
Subject: Re: [GENERAL] Recomended FS
>>But still the greatest question is what FS to put on ?
>>
>>I heard Reiesref can handle small files very quickly.
>
>Switching from ext3 to reiserfs for our name servers reduced the time
>taken to load 110,000 zones from 45 minutes to 5 minutes.
>
>However for a database, I don't think you can really factor this type of
>stuff into the equation. The performance benefits you get from
>different filesystem types are going to be small compared to the
>modifications that you can make to your database structure, queries and
>applications. The actual algorithms used in processing the data will be
>much slower than the time taken to fetch the data off disk.
So you say the FS has no real speed impact on the SB ?

In my pg data folder i have 2367 files, some big some small.


I'm saying: don't expect your DB performance to come on leaps and bounds
just because you changed to a different filesystem format. If you've
got speed problems then it might help to look elsewhere first.

I dont expect miracles :)
but still i have to choose one,so why shouldnt i choose the one which best
fit ?


I agree. I also think that the top of that logic develoment tree you
should ask yourself the first question of

"Is it ok that if the machine should suffer sudden catastrophic shutdown
due to any reason that I would have a corrupted database and would be
willing to reinitdb/restore from scratch?"

While I agree that in many instances this is acceptable, in
many it is not. If you may need it one day, SCSI is so much faster than
IDE when you turn off IDE's write cache that you now have a machine 1/2
as fast when you're on the IDE machine.

I pitted two systems against each other.

Machine A: < Clone of our current production box
Dual PIII-750MHz
1.5 Gig PC133 memory
dual 18 gig 10Krpm USCSI 160 drives

Maching B: < New machines intended to replace production box
Dual PIV Xeons-2.4GHz
2 Gig 400MHz memory
dual 80 gig 7200 RPM UDMA 133 drives

With two configs (all fresh 'initdb --locale=C'):
and postgresql.conf: wal_sync_method = open_sync, buffers = 4000.

Config 1:
/db on one partition (on IDE this always had write cache on.)
/pg_xlog on another (write cache on or off (W0/W1))

Config 2:
everything on /db/ which is a RAID-1 (both with write cache on or off on
W0/W1 on IDE) Allowed the software RAID-1 to replicate on both machines
before starting the tests.

With two possible IDE settings:

W0: Write cache off
W1: Write cache on

Note that W1 does not guarantee data integrity if power is lost while a
transaction is in progress (i.e. it's like running with fsync=false all
the time)

I ran pgbench -i -s 5 then pgbench -c 5 -t 1000 several times to
settle the machine, then ran pgbench -c 5 -t 1000 three times and chose
the median result of those three.

MachineA Config1:
141 tps

MachineB Config1 W0:
60 tps

MachineB Config1 W1:
112 tps

MachineA Config2:
101 tps

MachineB Config2 W0:
44 tps

MachineB Config2 W1:
135 tps

Just some numbers someone might find useful. I'll try to test both setups
in the same box later on if I get a chance. But it would seem that RAID
is performing better. I've tested all these configurations with the "pull
the plug" test. The SCSI survives in both configurations, while the IDE
will only survive uncorrupted when Write cache is off (W0).
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #40

P: n/a
Here are some recent benchmarks on different Linux filesystems. As with
any benchmarks, take what you will from the numbers.

Note the Summary section, and then the detailed benchmark numbers (if
you have a stomach for huge tables of pure numbers :)

http://fsbench.netnation.com/

scott.marlowe wrote:
On Mon, 20 Oct 2003, Ben-Nes Michael wrote:

----- Original Message -----
From: "Nick Burrett" <ni**@dsvr.net>
To: "Ben-Nes Michael" <mi**@canaan.co.il>
Cc: "postgresql" <pg***********@postgresql.org>
Sent: Monday, October 20, 2003 2:54 PM
Subject: Re: [GENERAL] Recomended FS

>>But still the greatest question is what FS to put on ?
>>
>>I heard Reiesref can handle small files very quickly.
>
>Switching from ext3 to reiserfs for our name servers reduced the time
>taken to load 110,000 zones from 45 minutes to 5 minutes.
>
>However for a database, I don't think you can really factor this type of
>stuff into the equation. The performance benefits you get from
>different filesystem types are going to be small compared to the
>modifications that you can make to your database structure, queries and
>applications. The actual algorithms used in processing the data will be
>much slower than the time taken to fetch the data off disk.
So you say the FS has no real speed impact on the SB ?

In my pg data folder i have 2367 files, some big some small.

I'm saying: don't expect your DB performance to come on leaps and bounds
just because you changed to a different filesystem format. If you've
got speed problems then it might help to look elsewhere first.


I dont expect miracles :)
but still i have to choose one,so why shouldnt i choose the one which best
fit ?

I agree. I also think that the top of that logic develoment tree you
should ask yourself the first question of

"Is it ok that if the machine should suffer sudden catastrophic shutdown
due to any reason that I would have a corrupted database and would be
willing to reinitdb/restore from scratch?"

While I agree that in many instances this is acceptable, in
many it is not. If you may need it one day, SCSI is so much faster than
IDE when you turn off IDE's write cache that you now have a machine 1/2
as fast when you're on the IDE machine.

I pitted two systems against each other.

Machine A: < Clone of our current production box
Dual PIII-750MHz
1.5 Gig PC133 memory
dual 18 gig 10Krpm USCSI 160 drives

Maching B: < New machines intended to replace production box
Dual PIV Xeons-2.4GHz
2 Gig 400MHz memory
dual 80 gig 7200 RPM UDMA 133 drives

With two configs (all fresh 'initdb --locale=C'):
and postgresql.conf: wal_sync_method = open_sync, buffers = 4000.

Config 1:
/db on one partition (on IDE this always had write cache on.)
/pg_xlog on another (write cache on or off (W0/W1))

Config 2:
everything on /db/ which is a RAID-1 (both with write cache on or off on
W0/W1 on IDE) Allowed the software RAID-1 to replicate on both machines
before starting the tests.

With two possible IDE settings:

W0: Write cache off
W1: Write cache on

Note that W1 does not guarantee data integrity if power is lost while a
transaction is in progress (i.e. it's like running with fsync=false all
the time)

I ran pgbench -i -s 5 then pgbench -c 5 -t 1000 several times to
settle the machine, then ran pgbench -c 5 -t 1000 three times and chose
the median result of those three.

MachineA Config1:
141 tps

MachineB Config1 W0:
60 tps

MachineB Config1 W1:
112 tps

MachineA Config2:
101 tps

MachineB Config2 W0:
44 tps

MachineB Config2 W1:
135 tps

Just some numbers someone might find useful. I'll try to test both setups
in the same box later on if I get a chance. But it would seem that RAID
is performing better. I've tested all these configurations with the "pull
the plug" test. The SCSI survives in both configurations, while the IDE
will only survive uncorrupted when Write cache is off (W0).
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #41

P: n/a
On Fri, 24 Oct 2003, Michael Teter wrote:
Here are some recent benchmarks on different Linux filesystems. As with
any benchmarks, take what you will from the numbers.

Note the Summary section, and then the detailed benchmark numbers (if
you have a stomach for huge tables of pure numbers :)

http://fsbench.netnation.com/


Right, but NONE of the benchmarks I've seen have been with IDE drives with
their cache disabled, which is the only way to make them reliable under
postgresql should something bad happen. but thanks for the benchmarks,
I'll look them over.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #42

P: n/a
On Friday 24 October 2003 16:23, scott.marlowe wrote:
Right, but NONE of the benchmarks I've seen have been with IDE drives with
their cache disabled, which is the only way to make them reliable under
postgresql should something bad happen. but thanks for the benchmarks,
I'll look them over.


I don't recall seeing anyone explain how to disable caching on a drive in this
thread. Did I miss that? 'Would be useful. I'm running a 3Ware mirror of 2
IDE drives.

Scott

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #43

P: n/a
Got to going this today, after a small delay due to the arrival of new
disks,

So the system is 2x700Mhz PIII, 512 Mb, Promise TX2000, 2x40G ATA-133
Maxtor Diamond+8 .
The relevent software is Freebsd 4.8 and Postgresql 7.4 Beta 2.

Two runs of 'pgbench -c 50 -t 1000000 -s 10 bench' with a power cord
removal after about 2 minutes were performed, one with hw.ata.wc = 1
(write cache enabled) and other with hw.ata.wc = 0 (disabled).

In *both* cases the Pg server survived - i.e it came up, performed
automatic recovery. Subsequent 'vacuum full' and further runs of pgbench
completed with no issues.

I would conclude that it not *always* the case that power failure
renders the database unuseable.

I have just noticed a similar posting from Scott were he finds the cache
enabled case has an dead database after power failure. It seems that
it's a question of how *likely* is it that the database will survive/not
survive a power failure...

The other interesting possibility is that Freebsd with soft updates
helped things remain salvageable in the cache enabled case (as some
writes *must* be lost at power off in this case)....

regards

Mark

scott.marlowe wrote:

OK, but here's the real test. As the postgres user, run 'pgbench -i',
then after that runs, run 'pgbench -c 50 -t 1000000'. While it's running
and settled (pg aux|grep postgres|wc -l should show a number of ~54 or
so.) pull the plug. Wait for the hard drives to spin down, then plug it
back in and power it one. With SCSI you will still have a coherent
database.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #44

P: n/a
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 26 Oct 2003 16:24:17 +1300, Mark Kirkwood wrote:
I would conclude that it not *always* the case that power failure
renders the database unuseable.

I have just noticed a similar posting from Scott were he finds the cache
enabled case has a dead database after power failure.

Other posts have noted that SCSI never fails under this condition. Apparently SCSI
drives sense an impending power loss and flush the cache before power completely
disappears. Speed *and* reliability. Hm.
Of course, anyone serious about a server would have it backed up with a UPS and
appropriate software to shut the system down during an extended power outage. This just
leaves people tripping over the power cords or maliciously pulling the plugs.
- --
jimoe at sohnen-moe dot com
pgp/gpg public key: http://www.keyserver.net/en/
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0 OS/2 for non-commercial use
Comment: PGP 5.0 for OS/2
Charset: cp850

wj8DBQE/m2PQsxxMki0foKoRAjsOAJ0ed1MV8FcWcALoxIJk66wn40EEvw CfVTPB
n/rxejkV2upgeZmoy3yipes=
=fDes
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #45

P: n/a
On Sat, Oct 25, 2003 at 11:04:00PM -0700, James Moe wrote:
Other posts have noted that SCSI never fails under this condition. Apparently SCSI
drives sense an impending power loss and flush the cache before power completely
disappears. Speed *and* reliability. Hm.
I understood it differently. Postgresql has WAL to deal with this situation.
This issue that it only works as long as the drive doesn't lie about which
blocks have been written and which are merely in cache. Apparently IDE disks
lie and SCSI disks don't. It may be a protocol thing.

The other alternative is battery backed memory. i.e. keep the blocks in
memory hoping that power will return to the drive before it fails. Some RAID
cards do this.

Another thing is that 3ware RAID controllers stick a SCSI interface in
front of the IDE drives, so perhaps it has more scope to deal with this
issue.

Remember, when power fails the first thing that happens is the system
cancels any DMA tranfer in progress as memory is the part most sensative to
power fluctuations.
Of course, anyone serious about a server would have it backed up with aUPS and
appropriate software to shut the system down during an extended power outage. This just
leaves people tripping over the power cords or maliciously pulling the plugs.
If you start adding up the points of failure it's quite a lot. But you
should be able to proof the system against even malicious tampering.
--
Martijn van Oosterhout <kl*****@svana.org> http://svana.org/kleptog/ "All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE/m2g5Y5Twig3Ge+YRApUAAKCgdeohxv/jn49jGHW4dJdsvIsTNQCgomWz
kuOH+216afo/LCes5lcTSmw=
=3AWJ
-----END PGP SIGNATURE-----

Nov 12 '05 #46

P: n/a
Don't forget that the power supply can fail too, so its not all about UPS,
and cords.

--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------
----- Original Message -----
From: "James Moe" <ji***@sohnen-moe.com>
To: "Postgresql General Mail List" <pg***********@postgresql.org>
Sent: Sunday, October 26, 2003 8:04 AM
Subject: Re: [GENERAL] Recomended FS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 26 Oct 2003 16:24:17 +1300, Mark Kirkwood wrote:
I would conclude that it not *always* the case that power failure
renders the database unuseable.

I have just noticed a similar posting from Scott were he finds the cache
enabled case has a dead database after power failure.
Other posts have noted that SCSI never fails under this condition.

Apparently SCSI drives sense an impending power loss and flush the cache before power completely disappears. Speed *and* reliability. Hm.
Of course, anyone serious about a server would have it backed up with a UPS and appropriate software to shut the system down during an extended power outage. This just leaves people tripping over the power cords or maliciously pulling the plugs.

- --
jimoe at sohnen-moe dot com
pgp/gpg public key: http://www.keyserver.net/en/
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0 OS/2 for non-commercial use
Comment: PGP 5.0 for OS/2
Charset: cp850

wj8DBQE/m2PQsxxMki0foKoRAjsOAJ0ed1MV8FcWcALoxIJk66wn40EEvw CfVTPB
n/rxejkV2upgeZmoy3yipes=
=fDes
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #47

P: n/a
En un mensaje anterior, Scott Chapman escribió:
I don't recall seeing anyone explain how to disable caching on a drive in this
thread. Did I miss that? 'Would be useful. I'm running a 3Ware mirror of 2
IDE drives.


In FreeBSD, add "hw.ata.wc=0" to /boot/loader.conf.

Regards.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #48

P: n/a
On Sat, 25 Oct 2003, James Moe wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 26 Oct 2003 16:24:17 +1300, Mark Kirkwood wrote:
I would conclude that it not *always* the case that power failure
renders the database unuseable.

I have just noticed a similar posting from Scott were he finds the cache
enabled case has a dead database after power failure.
Other posts have noted that SCSI never fails under this condition. Apparently SCSI
drives sense an impending power loss and flush the cache before power completely
disappears. Speed *and* reliability. Hm.


Actually, it would appear that the SCSI drives simply don't lie about
fsync. I.e. when they tell the OS that they wrote the data, they wrote
the data. Some of them may have caching flushing with lying about fsync
built in, but the performance looks more like just good fsyncing to me.
It's all a guess without examining the microcode though... :-)
Of course, anyone serious about a server would have it backed up with a UPS and
appropriate software to shut the system down during an extended power outage. This just
leaves people tripping over the power cords or maliciously pulling the plugs.


Or a CPU frying, or a power supply dying, or a motherboard failure, or a
kernel panic, or any number of other possibilities. Admittedly, the first
line of defense is always good backups, but it's nice knowing that if one
of my CPUs fry, I can pull it, put in the terminator / replacement, and my
whole machine will likely come back up.

But anyone serious about a server will also likely be running on SCSI as
well as on a UPS. We use a hosting center with 3 UPS and a Diesel
generator, and we still managed to lose power about a year ago when one
UPS went haywire, browned out the circuits of the other two, and the
diesel generator's switch burnt out. Millions of dollars worth of UPS /
high reliability equipment, and a $50 switch brought it all down.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #49

P: n/a
On Sun, 26 Oct 2003, Mark Kirkwood wrote:
Got to going this today, after a small delay due to the arrival of new
disks,

So the system is 2x700Mhz PIII, 512 Mb, Promise TX2000, 2x40G ATA-133
Maxtor Diamond+8 .
The relevent software is Freebsd 4.8 and Postgresql 7.4 Beta 2.

Two runs of 'pgbench -c 50 -t 1000000 -s 10 bench' with a power cord
removal after about 2 minutes were performed, one with hw.ata.wc = 1
(write cache enabled) and other with hw.ata.wc = 0 (disabled).

In *both* cases the Pg server survived - i.e it came up, performed
automatic recovery. Subsequent 'vacuum full' and further runs of pgbench
completed with no issues.
Sweet. It may be that the promise is turning off the cache, or that the
new generation of IDE drives is finally reporting fsync correctly. Was
there a performance difference in the set with write cache on or off?
I would conclude that it not *always* the case that power failure
renders the database unuseable.
But it usually is if write cache is enabled.
I have just noticed a similar posting from Scott were he finds the cache
enabled case has an dead database after power failure. It seems that
it's a question of how *likely* is it that the database will survive/not
survive a power failure...

The other interesting possibility is that Freebsd with soft updates
helped things remain salvageable in the cache enabled case (as some
writes *must* be lost at power off in this case)....


Free BSD may be the reason here. If it's softupdates are ordered in the
right way, it may be that even with write caching on, the drives "do the
right thing" under BSD. Time to get out my 5.0 disks and start playing
with my test server. Thanks for the test!
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #50

55 Replies

This discussion thread is closed

Replies have been disabled for this discussion.