467,903 Members | 1,830 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 467,903 developers. It's quick & easy.

The old raw devices chestnut.

Note the cross-posting - but no flame wars please.

This question was prompted by a thread on the a postgres mailing list
during which someone (Gregory Williamson) claimed

<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>

This made me think about the old arguments, and I wondered about the
current state of thinking. Some of my knowledge will be a bit out of
date.

Oracle: (my main experience)
At various times Oracle have claimed (talking to consultants, not
marketers) that raw devices are 5-20% faster than filesystems. This may
vary on the current state of the oracle code and/or the filesystem being
compared against. Veritas seem to agree by producing QuickIO for Oracle,
claiming "performance of raw with the management of filesystem".

I have never been sufficiently convinced to implement a major system
with raw.

Sybase: (some experience)
Sybase claim filesystems are faster, because of OS buffering, but unsafe
for the same reason. They only ever suggest filesystem for tempdb. They
don't seem to have heard of fsync()[1]

DB2:
No idea

Informix:
No idea beyond the claim which started this off.

What is the latest thinking, both in terms of vendor claims and
practical experience?

[1] or whatever system call forces write-through caching
--
Jim Smith
Because of their persistent net abuse, I ignore mail from
these domains (among others) .yahoo.com .hotmail.com .kr .cn .tw
For an explanation see <http://www.jimsmith.demon.co.uk/spam>
Nov 12 '05 #1
  • viewed: 4300
Share:
42 Replies
"Jim Smith" <ji*@jimsmith.demon.co.uk> wrote in message
news:dS**************@jimsmith.demon.co.uk...
This question was prompted by a thread on the a postgres mailing list
during which someone (Gregory Williamson) claimed

<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>

Let's see first if we can all understand what the heck
is meant by "faster"?

Is it faster I/O requests?
Or faster I/O overall?
Or less CPU used?

Here is IME:

1- Raw does not make for "faster" I/O. I/O speed is defined by
your hardware (disk + controller) and raw or cooked means nothing
in that context.

2- Raw produces overall faster I/O response, all else being equal.

3- However, this can be MUCH faster, or slightly faster.

Explain:

If db is reading off the file system buffer cache, then it is
eminently STUPID to claim "file system I/O" is faster: there is
no I/O in that case, just in-memory access!

If db is reading from disk to fs cache and then copying from there
to db cache then raw will be faster: it doesn't need to copy into the fs
cache, I/O goes directly to the db cache. In such cases one notices
a marked drop in CPU use(!) rather than I/O use by switching to raw:
the block/page copy doesn't come cheap or free.

If the fs cache allows for referencing via pointers from the db processes,
then the fs I/O will be almost same speed as raw but CPU use will be a
little higher: use of indirect addressing (via pointers) is slightly
heavier on resources than direct addressing (via segment addressing).

However a fs cache will be servicing requests from the ENTIRE system, not just
the database hardware. So the potential for heavy interference with the
db I/O activity is there. As such if the system is not a dedicated db server,
one can see some wild variations in I/O response times with varying system
loads.
It all depends on what is being measured, and when and how.

--
Cheers
Nuno Souto
wi*******@yahoo.com.au.nospam

Nov 12 '05 #2
"Jim Smith" <ji*@jimsmith.demon.co.uk> wrote in message
news:dS**************@jimsmith.demon.co.uk...
Note the cross-posting - but no flame wars please.

This question was prompted by a thread on the a postgres mailing list
during which someone (Gregory Williamson) claimed

<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>


I have 3 thoughts on this.

1. A FS is an extra layer of code between the db and the hardware, to that
extent if you are comparing a direct read from the actual disk - rather than
a read from a cache somewhere - then RAW will be faster - ten times seems a
hefty claim though.

2. Who cares - what matters isn't the speed of the disk read, but the
response time for the end users screens/jobs - if and only iff you can show
that this is slow and the speed is caused by a file system would you
consider RAW.

3. show us the numbers and the experiment done to prove the statement.
--
Niall Litchfield
Oracle DBA
Audit Commission UK
http://www.niall.litchfield.dial.pipex.com
*****************************************
Please include version and platform
and SQL where applicable
It makes life easier and increases the
likelihood of a good answer
******************************************
Nov 12 '05 #3
Jim Smith <ji*@jimsmith.demon.co.uk> wrote in
news:dS**************@jimsmith.demon.co.uk:
Sybase: (some experience)
Sybase claim filesystems are faster, because of OS buffering, but
unsafe for the same reason. They only ever suggest filesystem for
tempdb. They don't seem to have heard of fsync()[1]


Your data on Sybase ASE is dated. Sybase allows you to use buffered
devices and you can elect to have the writes posted (DSYNC=on) or
buffered (DSYNC=off). Sybase has always recommended one uses raw
devices.

For all DB's, I always use raw devices based on the benchmark
numbers I've run and intuitively, it makes sense. If you have a
read-mostly table, you may gain from utilizing OS buffering with a
32-bit DB and more than 4G of main memory on the box. (I'm over
simplifying however not too much).

A benchmarking trick I learned from one of the team members was to
use symlinks to my raw devices. If I needed to add more disks to a
particular volume, I could down the DBMS, dd the data out to another
disk, rebuild the volume, dd the data back, start the DBMS. Voila!

Regards,
--
Pablo Sanchez - Blueoak Database Engineering, Inc
http://www.blueoakdb.com
Nov 12 '05 #4
Jim Smith wrote:
Oracle: (my main experience)
At various times Oracle have claimed (talking to consultants, not
marketers) that raw devices are 5-20% faster than filesystems. This may
vary on the current state of the oracle code and/or the filesystem being
compared against. Veritas seem to agree by producing QuickIO for Oracle,
claiming "performance of raw with the management of filesystem".


Well, I have been using the Veritas VxFS Quick I/O feature on Informix
7.31 and 9.20 for a couple of years now. On my system running on heavy
OLTP load and large number of buffers, I get about 5x shorter checkpoint
times on VxFS with QIO, compared to cooked chunks on 'plain' Veritas
VxFS file system. And yes, LRU_MAX_DIRTY/LRU_MIN_DIRTY values are
already at 1/0.

QIO is essentially a normal file system which allows you to access the
space reserved by a file through a character special device. So on
Informix, you can use KAIO even on top of a file system.
--
Toni

Nov 12 '05 #5
Jim Smith wrote:
Note the cross-posting - but no flame wars please.

This question was prompted by a thread on the a postgres mailing list
during which someone (Gregory Williamson) claimed

<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>

This made me think about the old arguments, and I wondered about the
current state of thinking. Some of my knowledge will be a bit out of date.

[...]]

I'm still trying to figure out why ext3 filesystem under Red Hat Linux
seems to be *so* much faster than ufs under Solaris... even for simple
OS utilities like "find" and "cp", let alone Oracle imports and bulk
inserts. Is it the journaling? Or is it the five year old hardware? ;-)

Is software RAID slower than hardware RAID, or not? (do a search on
this... you'll see....).

I haven't worked with an Oracle raw device since version 7.3 seven years
ago, and would never go back. The administrative overhead is just too
much of a headache. We have Veritas Quick I/O licensed in our shop, but
we never implemented it in production because it turns out the database
I/O has never even been close to being the bottleneck for our particular
application... and as stated, your applicatino may be different.

--Mark Bole

Nov 12 '05 #6
I've found jfs to be "fast enough", considering we are moving
from SQL-Server, no raw-disks on that one ( chuckles and laughs
allowed :-) . The risks that raw-disks present makes me wonder
if they are worth it. Restoring raw-disk databases presents its
own set of problems, whereas regular files can be backed up and
restored more simply and with more flexibility. After all is
said and done, we already see an increase in Linux over Windows,
so the expectations are different in our shop. Maybe in a situation
where you need to squeeze every bit of speed out of the system
are raw-disks worth it, only to have to live with the limitations
of having to restore the database in the event of a crash.
( Especially with Informix, the restore horror stories alone are
enought to scare me away from raw disks. And really, if you are
on a SAN, aren't you wasting your time with raw disks anyway? What
'raw' disk are you really accessing on a SAN? Having used raw
disks in the past vs using regular files at least from an
operational viewpoint I'd go with regular files any day of the
year. Perhaps good backups will offset the fear for now of anything
shitting our jfs drives until we hear of something better. JFS
is supposed to be improving too, to add data journaling along with
the metadata in the near future. It is also 'supposedly' the
lowest overhead file system, but every few months things change,
and of course the high priests of Reiser will soon come down from
heaven and release yet another version to mere mortals, so it looks
the game changes frequently. No matter, we'll back up the database,
then restore it to whatever file system is most appropriate, but
no raw disks for now, they're just not portable enough.

"Mark Bole" <ma***@pacbell.net> wrote in message news:3N*******************@newssvr29.news.prodigy. com...
Jim Smith wrote:
Note the cross-posting - but no flame wars please.

This question was prompted by a thread on the a postgres mailing list
during which someone (Gregory Williamson) claimed

<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>

This made me think about the old arguments, and I wondered about the
current state of thinking. Some of my knowledge will be a bit out of date.

[...]]

I'm still trying to figure out why ext3 filesystem under Red Hat Linux
seems to be *so* much faster than ufs under Solaris... even for simple
OS utilities like "find" and "cp", let alone Oracle imports and bulk
inserts. Is it the journaling? Or is it the five year old hardware? ;-)

Is software RAID slower than hardware RAID, or not? (do a search on
this... you'll see....).

I haven't worked with an Oracle raw device since version 7.3 seven years
ago, and would never go back. The administrative overhead is just too
much of a headache. We have Veritas Quick I/O licensed in our shop, but
we never implemented it in production because it turns out the database
I/O has never even been close to being the bottleneck for our particular
application... and as stated, your applicatino may be different.

--Mark Bole


Nov 12 '05 #7
Raw disk has the potential for best performance with large tablespace and
index scans in a data warehouse environment. During sequential prefetch, DB2
can make sure that the extents are contiguous on the disk if the space is
DMS raw disk. It makes no difference for OLTP systems. Decision support
systems that do not frequently use large tablespace scans, also will show
very little benefit. To the extent that data is already in the buffer pool,
it makes no difference.

Software RAID does not perform well. If striping is desired, superior
performance can usually be achieved by placing DB2 containers on different
physical drives (or different hardware arrays) and letting DB2 stripe the
data across the containers.

If disks are striped by the OS or hardware using some form of RAID, you must
make sure the DB2 parameter for Container Tag Size is set properly. For
version 7 that means that DB2_STRIPED_CONTAINERS must be set ON or YES if
RAID is used. This sets the container tag equal to the extent size (instead
of using a single page). For version 8, make sure that
DB2_USE_PAGE_CONTAINER_TAG is set OFF or NULL for RAID disks. This will make
sure that extents after the container tag will align properly with the RAID
stripes.

Also be sure that the extent size is equal to or a multiple of the stripe
size, and the prefetch size should be a multiple of the extent size. If data
is striped by DB2 (by placing the containers on different physical drives)
then the prefetch size should be n times the extent size, where n is the
number of containers that are placed on different physical disks (or
different physical RAID arrays).

These comments relate to DB2 and are different with other DBMS'.
Nov 12 '05 #8
Jim Smith wrote:
Note the cross-posting - but no flame wars please.

This question was prompted by a thread on the a postgres mailing list
during which someone (Gregory Williamson) claimed

<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>
That's a pretty steep claim -- I would say it was an exaggeration.
Raw is usually faster, all other things being equal, but I doubt if 10
times could be justified.
This made me think about the old arguments, and I wondered about the
current state of thinking. Some of my knowledge will be a bit out of date.

Oracle: (my main experience)
At various times Oracle have claimed (talking to consultants, not
marketers) that raw devices are 5-20% faster than filesystems. This may
vary on the current state of the oracle code and/or the filesystem being
compared against. Veritas seem to agree by producing QuickIO for Oracle,
claiming "performance of raw with the management of filesystem".

I have never been sufficiently convinced to implement a major system
with raw.

Sybase: (some experience)
Sybase claim filesystems are faster, because of OS buffering, but unsafe
for the same reason. They only ever suggest filesystem for tempdb. They
don't seem to have heard of fsync()[1]

DB2:
No idea

Informix:
No idea beyond the claim which started this off.
The advantage, and possibly disadvantage, of raw i/o over cooked i/o
is that the data is copied less. With cooked i/o, the data is copied
first from the user process (eg DBMS) to the kernel buffer space, and
then from kernel buffer pool to the disk, or vice versa. With raw
i/o, the transfer can occur direct from disk to user process without
travelling through the kernel buffer pool. That's one less copy
operation plus the overhead of coordinating the access. Against that,
the kernel buffer pool can sometime provide the same disk block to
multiple processes without rereading the disk. However, since
Informix's process structure is set up so that all the DBMS data pages
are kept in a shared memory segment, all the DBMS processes (distinct
from the user processes which aks the DBMS to do things) can share the
page without troubling the Unix kernel. So, if your system is busy
working on database stuff, the best use of memory is gained by having
the disk drivers copy data directly to/from the shared memory buffer
pool to disk - making the data available to any of the processes
comprising the DBMS without further copying.

YMMV, as they say. It depends on many factors. Generally, I'd quote
a 10-20% performance benefit from raw disk over cooked (not times,
just percent). That's not something I've measured recently, but it is
in the right ballpark for historical systems.

Things like humoungous main memories (TB of main memory) combined with
monstrous disks (TB of them, too) and logical volume managers,
SAN/NAS, RAID and the like all make the analysis more complex.

What is the latest thinking, both in terms of vendor claims and
practical experience?

[1] or whatever system call forces write-through caching


O_FSYNC or O_DSYNC flag on open() system call? There are three or
four synchronization options in POSIX 2003, IIRC.
--
Jonathan Leffler #include <disclaimer.h>
Email: jl******@earthlink.net, jl******@us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/

Nov 12 '05 #9
On 2004-04-13, Mark A scribbled:
Raw disk has the potential for best performance with large tablespace
and index scans in a data warehouse environment. During sequential
prefetch, DB2 can make sure that the extents are contiguous on the
disk if the space is DMS raw disk. It makes no difference for OLTP
systems. Decision support systems that do not frequently use large
tablespace scans, also will show very little benefit. To the extent
that data is already in the buffer pool, it makes no difference.
One other point to note regarding DB2's DMS (Database Managed)
table-spaces is that they don't /have/ to be on raw devices - you can
use file "containers" instead (just a great big file on the file-system
the entire contents of which is managed by DB2). Assuming the file
system has sufficient contiguous space (or you defrag after creating
one) you get the benefit of contiguous data without the complexity of
raw devices.

Though I haven't tried raw devices on DB2, I regularly use DMS
file-based containers and have found them to be noticeably faster than
the default SMS (System Managed) table-spaces (which are mostly
individual files for individual database objects) with queries against
massive tables.

I'd be interested to know if anyone's directly compared DB2's DMS
table-spaces on raw devices and file system containers. I suspect
there'd be a difference - but I doubt it'd be huge.
Software RAID does not perform well. If striping is desired, superior
performance can usually be achieved by placing DB2 containers on
different physical drives (or different hardware arrays) and letting
DB2 stripe the data across the containers.
Well, I do recall an interesting review of hardware and software RAID
devices by Anandtech a while ago (all on Wintel boxes), in which the
hardware RAID devices were outperformed by every single software RAID
device they were trying. Still, benchmarks have never been a great way
of measuring real-world performance, and this was only on Wintel, so
YMMV.
These comments relate to DB2 and are different with other DBMS'.


Same here.

--
Dave
Remove "_nospam" for valid e-mail address

"Never underestimate the bandwidth of a station wagon full of CDs doing
a ton down the highway" -- Anon.
Nov 12 '05 #10
I wonder whether it makes the same difference on RAID.

Andy
Jim Smith <ji*@jimsmith.demon.co.uk> wrote in message news:<dS**************@jimsmith.demon.co.uk>...
....
<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>

Nov 12 '05 #11
[cutting]

<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>


Tooooo much, at best I've seen 30%. I'd budget on 10-15%

[cutting]

--
Paul Watson #
Oninit Ltd # Growing old is mandatory
Tel: +44 1436 672201 # Growing up is optional
Fax: +44 1436 678693 #
Mob: +44 7818 003457 #
www.oninit.com #
Nov 12 '05 #12
In message <40**********************@news.optusnet.com.au>, Noons
<wi*******@yahoo.com.au> writes
"Jim Smith" <ji*@jimsmith.demon.co.uk> wrote in message
news:dS**************@jimsmith.demon.co.uk...
This question was prompted by a thread on the a postgres mailing list
during which someone (Gregory Williamson) claimed

<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>
Let's see first if we can all understand what the heck
is meant by "faster"?

Is it faster I/O requests?
Or faster I/O overall?
Or less CPU used?


None of the above. A faster response and or throughput from a user point
of view.
Here is IME:
1- Raw does not make for "faster" I/O. I/O speed is defined by
your hardware (disk + controller) and raw or cooked means nothing
in that context.
2- Raw produces overall faster I/O response, all else being equal.

3- However, this can be MUCH faster, or slightly faster.

Explain:

If db is reading off the file system buffer cache, then it is
eminently STUPID to claim "file system I/O" is faster: there is
no I/O in that case, just in-memory access!


But it will give a faster response therefore "file systems are faster".

But, database blocks will often be buffered in the db buffer cache so
the file system buffer may be irrelevant or even an overhead. I vaguely
remember on VMS it was recommended you disable disk caching and used
oracle's buffer cache.
--
Jim Smith
Because of their persistent net abuse, I ignore mail from
these domains (among others) .yahoo.com .hotmail.com .kr .cn .tw
For an explanation see <http://www.jimsmith.demon.co.uk/spam>
Nov 12 '05 #13
In message <40**********************@news-text.dial.pipex.com>, Niall
Litchfield <ni**************@dial.pipex.com> writes
"Jim Smith" <ji*@jimsmith.demon.co.uk> wrote in message
news:dS**************@jimsmith.demon.co.uk...
Note the cross-posting - but no flame wars please.

This question was prompted by a thread on the a postgres mailing list
during which someone (Gregory Williamson) claimed

<quote>
raw devices, at least on Solaris, are about 10 times as fast as cooked
file systems for Informix.
<quote>

2. Who cares - what matters isn't the speed of the disk read, but the
response time for the end users screens/jobs - if and only iff you can show
that this is slow and the speed is caused by a file system would you
consider RAW.


I didn't really ask about IO performance, I was talking about
application performance.
--
Jim Smith
Because of their persistent net abuse, I ignore mail from
these domains (among others) .yahoo.com .hotmail.com .kr .cn .tw
For an explanation see <http://www.jimsmith.demon.co.uk/spam>
Nov 12 '05 #14
"Dave Hughes" <da*********@waveform.plus.com> wrote in message
news:xn****************@usenet.plus.net...
On 2004-04-13, Mark A scribbled:
Raw disk has the potential for best performance with large tablespace
and index scans in a data warehouse environment. During sequential
prefetch, DB2 can make sure that the extents are contiguous on the
disk if the space is DMS raw disk. It makes no difference for OLTP
systems. Decision support systems that do not frequently use large
tablespace scans, also will show very little benefit. To the extent
that data is already in the buffer pool, it makes no difference.
One other point to note regarding DB2's DMS (Database Managed)
table-spaces is that they don't /have/ to be on raw devices - you can
use file "containers" instead (just a great big file on the file-system
the entire contents of which is managed by DB2). Assuming the file
system has sufficient contiguous space (or you defrag after creating
one) you get the benefit of contiguous data without the complexity of
raw devices.

Though I haven't tried raw devices on DB2, I regularly use DMS
file-based containers and have found them to be noticeably faster than
the default SMS (System Managed) table-spaces (which are mostly
individual files for individual database objects) with queries against
massive tables.

The problem with SMS is that new space is allocated one page at a time. This
can slow down rapidly growing tablspaces. However one can run db2empfa to
have SMS tablespaces allocate one extent at a time.
I'd be interested to know if anyone's directly compared DB2's DMS
table-spaces on raw devices and file system containers. I suspect
there'd be a difference - but I doubt it'd be huge.

IBM has compared them and there is a slight difference when doing tablespace
scans, but almost no difference otherwise. However using file systems
enables OS file caching, which can be a benefit or a hinderance, depending
on who you ask, or what exact situtation is present.

Nov 12 '05 #15
Mark Bole wrote:
I haven't worked with an Oracle raw device since version 7.3 seven
years ago, and would never go back. The administrative overhead is
just too much of a headache.


[Posting from comp.databases.informix]

I've never understood the "administrative overhead" argument against raw
spaces. Can you elicidate on the administration you think needs to be
applied? In my experience, the only admin overhead is making sure that a
new, naive sysop doesn't try to turn a raw space into a filesystem - I saved
an engine by SECONDS once, by looking over the shoulder of said sysop as I
walked past....

Apart from that, I find that admin is slightly simpler simply because you
don't need to run a mkfs style of command ;-)

I have heard that Solaris (?) is in the habit of re-creating /dev at boot
time, and I know that DG-UX used to do the same thing. Therefore some sort
of tool is needed to make sure the raw devs exist after a boot. Is this what
you mean?
Nov 12 '05 #16
Data Goob wrote:
I've found jfs to be "fast enough", considering we are moving
from SQL-Server, no raw-disks on that one ( chuckles and laughs
allowed :-) . The risks that raw-disks present makes me wonder
if they are worth it. Restoring raw-disk databases presents its
own set of problems, whereas regular files can be backed up and
restored more simply and with more flexibility.


Ummmmm, if you try to archive regular files when the database is active, you
*will* have an inconsistent snapshot of the database. Do you perform these
backups only when the engine is offline? I can't see this being safe on any
brand of engine if the database is "twinkling". The only way to get a
consistent (ie logical instant in time) backup of a live, active database is
to use a tool which maintains the illusion on your behalf.
Nov 12 '05 #17
Andrew Hamm wrote:

Mark Bole wrote:
I haven't worked with an Oracle raw device since version 7.3 seven
years ago, and would never go back. The administrative overhead is
just too much of a headache.


[Posting from comp.databases.informix]

I've never understood the "administrative overhead" argument against raw
spaces. Can you elicidate on the administration you think needs to be
applied? In my experience, the only admin overhead is making sure that a
new, naive sysop doesn't try to turn a raw space into a filesystem - I saved
an engine by SECONDS once, by looking over the shoulder of said sysop as I
walked past....

Apart from that, I find that admin is slightly simpler simply because you
don't need to run a mkfs style of command ;-)

I have heard that Solaris (?) is in the habit of re-creating /dev at boot
time, and I know that DG-UX used to do the same thing. Therefore some sort
of tool is needed to make sure the raw devs exist after a boot. Is this what
you mean?


This should only happen with a boot -r, Sun doesn't (used to) guarantee
the
mapping of mulitple external physical devices (A5200, A1000 etc) is
consistent
when boot -r'ing. The 'boot -r' asks what's out there and then assigns
the
devices in the order in which they respond. Mostly it catches lazy
sysadmins,
or sysadmins without larger server experience out. When the disks are
used
with filesystems AFAIK Sun can map the FS back to the disks, I'm
guessing
they use major/minor numbers. Solaris can't do this with the raw disk
'cos Solaris has no idea how they are being used.
--
Paul Watson #
Oninit Ltd # Growing old is mandatory
Tel: +44 1436 672201 # Growing up is optional
Fax: +44 1436 678693 #
Mob: +44 7818 003457 #
www.oninit.com #
Nov 12 '05 #18
Andrew,

Fair enough considerations, and you hit the point on
the head, the backup and restore.

We have always set up backup of logs and data to a real
backup, but considering all the options I might 'need'
in the event of system failure, or migration, or cloning,
raw devices are an added risk. We use a third-party backup
tool to backup our SQL-Server databases, as well as EMC
Timefinder to flash-copy databases into production environments.
We will apply the same methods with DB2 or MySQL for that
matter. The raw-disk paradigm adds only extra trouble/work.

The real question is, how will you backup, much less restore
raw device databases? Are you prepared to deal with the
inflexibility it presents? Have you ever run Informix or
other vendor database through a complete backup and restore,
testing all the options with raw-device dbspaces? Most DBAs
I've run into have never had to restore a database at any
time in their career much less even test it. Is that amazing
or what!

Consider too, clustering of systems and disks, and the
administrative challenges associated with that. Add raw
devices to multitudes of servers and it becomes more risky
and more to manage. It puts more opportunities for failure
in the administration path, and nobody in their right mind
wants to increase risk. Is the extra 10% in speed worth it?
Maybe that 'speed' can come from somewhere else?

One other thing you might find attractive to plain ole files
instead of raw devices is the ability to clone databases. You
can get quite creative with plain files in ways that you cannot
with raw devices because of the lock-in of raw-devices. You
create more work for yourself in the long run with raw disks,
unless of course you're not as lazy as me and actually enjoy
the extra work. '-)

"Andrew Hamm" <ah***@mail.com> wrote in message news:c5************@ID-79573.news.uni-berlin.de...
Data Goob wrote:
I've found jfs to be "fast enough", considering we are moving
from SQL-Server, no raw-disks on that one ( chuckles and laughs
allowed :-) . The risks that raw-disks present makes me wonder
if they are worth it. Restoring raw-disk databases presents its
own set of problems, whereas regular files can be backed up and
restored more simply and with more flexibility.


Ummmmm, if you try to archive regular files when the database is active, you
*will* have an inconsistent snapshot of the database. Do you perform these
backups only when the engine is offline? I can't see this being safe on any
brand of engine if the database is "twinkling". The only way to get a
consistent (ie logical instant in time) backup of a live, active database is
to use a tool which maintains the illusion on your behalf.


Nov 12 '05 #19

"Data Goob" <da******@hotmail.com> wrote
One other thing you might find attractive to plain ole files
instead of raw devices is the ability to clone databases. You
can get quite creative with plain files in ways that you cannot
with raw devices because of the lock-in of raw-devices.

Please elaborate.

If you use symbolic links for raw devices, then there is no lock-in
of raw devices, or am I missing something.
Nov 12 '05 #20
rkusenet wrote:
"Data Goob" <da******@hotmail.com> wrote
One other thing you might find attractive to plain ole files
instead of raw devices is the ability to clone databases. You
can get quite creative with plain files in ways that you cannot
with raw devices because of the lock-in of raw-devices.

Please elaborate.

If you use symbolic links for raw devices, then there is no lock-in
of raw devices, or am I missing something.


I don't think so. Even with cooked files I'd be using symlinks; you never
know when you need to put a lump onto another file system. Speaking only
Informixly here, I can clone a live engine with symlinks and a simple
procedure, and only a temporary outage to bounce the parent engine, so once
again I don't see any administrative dramas with raw spaces. I think I just
need to shrug and move on; nobody seems to have concrete facts to backup the
claim. [about to reply to Data Goobs original message on that one - there's
something in there that hints at something.]

Also different engines will have different issues, so we've got to keep a
careful distance from any specific engine on this cross-posted thread.
Further, talking about restores rather than raw devices is straying a bit
too far from the thread.
Nov 12 '05 #21
Data Goob wrote:
Andrew,

Fair enough considerations, and you hit the point on
the head, the backup and restore.

We have always set up backup of logs and data to a real
backup, but considering all the options I might 'need'
in the event of system failure, or migration, or cloning,
raw devices are an added risk. We use a third-party backup
tool to backup our SQL-Server databases, as well as EMC
Timefinder to flash-copy databases into production environments.
We will apply the same methods with DB2 or MySQL for that
matter. The raw-disk paradigm adds only extra trouble/work.
OK - so you are talking about SQL-Server and various issues related to a
backup tool you use? If so, we have no argument; I know nothing of your
tools, and am speaking from an Informix + UNIX point of view. Our beloved
symlinks on UNIX are not available to NT servers, so suggestions on this
cannot help. I'm also of the opinion (only from reading) that unbuffered
NTFS files are equivalent in performance and reliability to raw spaces on
NT:

Even with Informix on NT (of which i have almost no experience) symlinks are
not available, and further, the recommendations from Informix states that
you can use normal files (O/S buffered and capable of going onto FAT), or
normal files on NTFS (which will be used unbuffered) or a raw partition on
NT. Further, the documentation says that on NT, the use of unbuffered NTFS
files is of equal performance to NT raw spaces, therefore it's not worth
using raw spaces on NT. But that's NT, something I don't play with.
The real question is, how will you backup, much less restore
raw device databases? Are you prepared to deal with the
inflexibility it presents? Have you ever run Informix or
other vendor database through a complete backup and restore,
testing all the options with raw-device dbspaces?
Absolutely. Sometimes under great duress. The use of raw spaces has always
made the restores faster too. Jonathan Leffler (c.d.i stalwart and Informix
Insider) has disputed the claim someone made that raw can be 5-10 TIMES
faster, and I have to agree on that. Except for two points [once again,
Informix+UNIX specific]:

1) during initialisation of an instance, if you initialise on cooked files,
the creation of the dbspaces and especially the physical and logical logs
takes an insufferable length of time. With raw spaces, creation of an engine
is quite a few times faster. That means a great deal in the middle of a
day's work.

2) During a restore, raw spaces are a few times faster.

and 3) during a mass load of data, raw spaces are a few times faster. So is
the proper use of PDQ and artificially inflated allocations of shared memory
for a few unstated reasons (informix, once again...)

4) There is no point 4.

All of these points are significant when major sequential writing is taking
place.

BTW I must also qualify that this experience applies to machines without
fancy storage managers. If you are using a machine with SCSI disks directly
connected to the SCSI bus or a straight-forward SCSI raid controller, then
you'll notice performance benefits from using raw with Informix (any other
engines?) If it's got a big fat storage manager, then it's implementation
will hide the benefits, in which case, you need to ask "I'm using Engine E
on platform P with storage manager SM, so what's the best storage model to
use?"
Most DBAs
I've run into have never had to restore a database at any
time in their career much less even test it. Is that amazing
or what!
It's more tragic than anything else. By a strange coincidence I'm in a
discussion on this subject in another forum, and we're swapping horror
stories, such as a customer who backed up to a cleaner tape for 2 weeks, or
a customer who needed a restore and discovered that their 3 year old tapes
cannot be read on their 5 year old tape drive which hasn't seen a cleaner
tape ever....

I've had to do restores on customer sites who could not or would not afford
disk mirrors, so we really did rely on the tapes for the redundancy. I've
seen power supplies blow up. All sorts of things can and do happen.
Consider too, clustering of systems and disks, and the
administrative challenges associated with that. Add raw
devices to multitudes of servers and it becomes more risky
and more to manage. It puts more opportunities for failure
in the administration path, and nobody in their right mind
wants to increase risk. Is the extra 10% in speed worth it?
Well, as I said, I don't understand the alleged extra administration.
Clearly it's an NT thing? Or an issue for people who don't use the magic of
symlinks? Pass. I'll stop asking now.
One other thing you might find attractive to plain ole files
instead of raw devices is the ability to clone databases. You
can get quite creative with plain files in ways that you cannot
with raw devices because of the lock-in of raw-devices. You
create more work for yourself in the long run with raw disks,
unless of course you're not as lazy as me and actually enjoy
the extra work. '-)


This is where the YMMV slogan comes into play. I would never setup an
Informix,UNIX,SCSI machine with anything else but symlinks and raw spaces.
If I ever setup a machine with a high performance storage box, I'll look
into the most appropriate mechanism for that. I'd probably setup any brand
of engine on UNIX with symlinks, unless experience or advice shows that it's
pointless.

As for administration of raw spaces, on UNIX it's trivial. Meaningless. Not
a problem. Do what your engine, backup tool and storage manager works best
with. Unless someone else chips in with some detailed advice about other
brands, the original poster will only be lurnin' about Informix engines
today. If the OP wants more advice about Informix, please we should stop
cross-posting and get into more detail only on comp.databases.informix, and
stop boring the other newsgroups.

Goob, I think you hang around c.d.i quite a bit, so if you wish, please
further my education about the pain of raw spaces on c.d.i. Perhaps some
specific stories are needed so I can undertand your experience.
Nov 12 '05 #22
"Andrew Hamm" <ah***@mail.com> wrote in message news:c5************@ID-79573.news.uni-berlin.de...
OK - so you are talking about SQL-Server and various issues related to a
backup tool you use?
Yes, standard backup tools.
If so, we have no argument; I know nothing of your
tools, and am speaking from an Informix + UNIX point of view. Our beloved
symlinks on UNIX are not available to NT servers, so suggestions on this
cannot help. I'm also of the opinion (only from reading) that unbuffered
NTFS files are equivalent in performance and reliability to raw spaces on
NT:

SQL-Server is a dish best served, well, cooked. In most SQL-Server
situations, probably 95% or more, SQL-Server is stored in plain ole
files in a directory. You can create FILEGROUPS akin to containers
and dbspaces, but in the real world most SQL-Server people haven't
a clue about how to use FILEGROUPS so they simply stuff everything
in one big default filegroup called PRIMARY. It's the equivalent
in Informix of leaving everything in the rootdbs and never bothering
with it, letting it get larger and larger. SQL-Server databases can
be detached, that is, taken off-line, moved, copied, etc. But most
people don't ever bother with disk layout except in the larger shops.
As you suggest there are no linked files, this concept is completely
opaque to Windows people they just don't connect with it. Incidentally
if you use more than 16 files to build filegroups you cannot use the
GUI to reattach a database--very cool thing to learn in a down-server
situation. You instead have to use a script like the one below with
the syntax FOR ATTACH at the end. Really spiffy.

Example 1. Mydatabase with a lot of FILEGROUPS :

CREATE DATABASE [Mydatabase] ON PRIMARY
( NAME = 'MYDB_PRIMARY_00' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_PRIMARY_00.MDF' , SIZE = 2048 MB , FILEGROWTH = 0% ) ,
FILEGROUP DATA
( NAME = 'MYDB_01' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_01.NDF' , SIZE = 4172 MB , FILEGROWTH = 20% ) ,
( NAME = 'MYDB_02' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_02.NDF' , SIZE = 4483 MB , FILEGROWTH = 20% ) ,
( NAME = 'MYDB_03' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_03.NDF' , SIZE = 3887 MB , FILEGROWTH = 20% ) ,
( NAME = 'MYDB_04' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_04.NDF' , SIZE = 3991 MB , FILEGROWTH = 20% ) ,
( NAME = 'MYDB_05' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_05.NDF' , SIZE = 7964 MB , FILEGROWTH = 20% )
FILEGROUP IDX
( NAME = 'MYDB_IDX_01' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_IDX_01.NDF' , SIZE = 2048 MB , FILEGROWTH = 20% ) ,
( NAME = 'MYDB_IDX_02' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_IDX_02.NDF' , SIZE = 2048 MB , FILEGROWTH = 20% ) ,
( NAME = 'MYDB_IDX_03' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_IDX_03.NDF' , SIZE = 2048 MB , FILEGROWTH = 20% ) ,
( NAME = 'MYDB_IDX_04' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_IDX_04.NDF' , SIZE = 2048 MB , FILEGROWTH = 20% ) ,
( NAME = 'MYDB_IDX_05' , FILENAME = 'R:\DATA\MYDB_Data\MYDB_IDX_05.NDF' , SIZE = 2048 MB , FILEGROWTH = 20% )
LOG ON
( NAME = 'MYDB_LOG_01' , FILENAME = 'R:\DATA\MYDB_Logs\MYDB_LOG_01.LDF' , SIZE = 56 MB , FILEGROWTH = 10% )
GO

Example 2. Mydatabase with no thought or plan, out of the box:

CREATE DATABASE [A] ON PRIMARY
( NAME = 'a_Data' , FILENAME = 'R:\Data\A\DATA\A_Primary.MDF' , SIZE = 4 MB , FILEGROWTH = 10% ) ,
LOG ON
( NAME = 'a_Log' , FILENAME = 'R:\DATA\A\LOG\A_Log.ldf' , SIZE = 14 MB , FILEGROWTH = 10% )
GO

Big bummer on FILEGROUPS, if you set up a clustered index guess where your
table goes? It gets moved into the same FILEGROUP as the index! Is this retarded
or what! All that planning, all that design, down the toilet. Detached indexes
are only valid on non-clustered indexes. ( And people say they will move to SQL
if Informix dies. Hee hee! We haven't even talked about logging... )
Even with Informix on NT (of which i have almost no experience) symlinks are
not available, and further, the recommendations from Informix states that
you can use normal files (O/S buffered and capable of going onto FAT), or
normal files on NTFS (which will be used unbuffered) or a raw partition on
NT. Further, the documentation says that on NT, the use of unbuffered NTFS
files is of equal performance to NT raw spaces, therefore it's not worth
using raw spaces on NT. But that's NT, something I don't play with.

Again with risk. SQL-Server docs specifically point to raw-disks as
an unsupported feature, so this is not really an option in Windows.
Who would be available to support it? Most Windows people wouldn't
even want to attempt this one. What about planning, migrations, etc?
There wouldn't be anyone around to admin a raw-disk SQL-Server. :-)

....

4) There is no point 4.

All of these points are significant when major sequential writing is taking
place.

BTW I must also qualify that this experience applies to machines without
fancy storage managers. If you are using a machine with SCSI disks directly
connected to the SCSI bus or a straight-forward SCSI raid controller, then
you'll notice performance benefits from using raw with Informix (any other
engines?) If it's got a big fat storage manager, then it's implementation
will hide the benefits, in which case, you need to ask "I'm using Engine E
on platform P with storage manager SM, so what's the best storage model to
use?"

I defer to K.I.S.S.
One other thing you might find attractive to plain ole files
instead of raw devices is the ability to clone databases. You
can get quite creative with plain files in ways that you cannot
with raw devices because of the lock-in of raw-devices. You
create more work for yourself in the long run with raw disks,
unless of course you're not as lazy as me and actually enjoy
the extra work. '-)


This is where the YMMV slogan comes into play. I would never setup an
Informix,UNIX,SCSI machine with anything else but symlinks and raw spaces.


If you have the time be my guest.
If I ever setup a machine with a high performance storage box, I'll look
into the most appropriate mechanism for that. I'd probably setup any brand
of engine on UNIX with symlinks, unless experience or advice shows that it's
pointless.

Welcome to my world.

As for administration of raw spaces, on UNIX it's trivial. Meaningless. Not
a problem. Do what your engine, backup tool and storage manager works best
with. Unless someone else chips in with some detailed advice about other
brands, the original poster will only be lurnin' about Informix engines
today. If the OP wants more advice about Informix, please we should stop
cross-posting and get into more detail only on comp.databases.informix, and
stop boring the other newsgroups.

Goob, I think you hang around c.d.i quite a bit, so if you wish, please
further my education about the pain of raw spaces on c.d.i. Perhaps some
specific stories are needed so I can undertand your experience.


I'm not arguing that maintaining symlinks or raw-disks are difficult. In
an environment where things are somewhat stable and you have the expertise
available I'm sure there are benefits. But in shops where the expertise is
not available raw disks are a luxury. The additional effort in setting
them up and getting people to take care of them adds risk. It's not a
competition either, every shop is different and has different needs. Me,
I'm lazy and concerned that the talent pool won't be able to manage them
properly. As fast as our environment changes we don't have the time for
luxuries. The expectations again are also something you manage. We are
seeing speed on Linux that is easily 10x faster than on Windows, and with
considerable hardware to compare with. This is with cooked files, so again,
we're already faster than windows, so how are raw-disks going to really
make a difference? 10%? Not enough to bother with, and with as many
servers as I manage, definitely more trouble than benefit.


Nov 12 '05 #23
Andrew Hamm wrote:
Mark Bole wrote:
I haven't worked with an Oracle raw device since version 7.3 seven
years ago, and would never go back. The administrative overhead is
just too much of a headache.

[Posting from comp.databases.informix]

I've never understood the "administrative overhead" argument against raw
spaces. Can you elicidate on the administration you think needs to be
applied? In my experience, the only admin overhead is making sure that a
new, naive sysop doesn't try to turn a raw space into a filesystem - I saved
an engine by SECONDS once, by looking over the shoulder of said sysop as I
walked past....

Apart from that, I find that admin is slightly simpler simply because you
don't need to run a mkfs style of command ;-)

I have heard that Solaris (?) is in the habit of re-creating /dev at boot
time, and I know that DG-UX used to do the same thing. Therefore some sort
of tool is needed to make sure the raw devs exist after a boot. Is this what
you mean?


Gawd, this cross-posting is a little scary... I've touched DB2,
administered Sybase for four years in the mid-nineties, otherwise Oracle
is it... so take the following for whatever it's worth.

To answer the question above: no, actually my experiences with Oracle
raw devices were under DEC Ultrix, SGI Irix, and HP-UX. I've only
managed "normal" filesystems under Solaris, Linux, and Windows. To this
day the thought of messing with the /dev filesystem stresses me out... --)

Yes, I've had to recover production (Oracle, Sybase) systems in real
life disaster situations: Score: raw filesystems, W-1, L-1. cooked
filesystems, W-2, L-0. So you can see where I'm heading...

I use "adminstration" in a very tool-oriented sense. With cooked file
systems (which I am defining as filesystems the OS is designed for...
tautology or no...), there are lots of tools: tools for timestamps,
tools for sizes, tools for checksums, tools for copying, tools for
finding, tools for checking open handles, and so on. With raw
filesystems, I have none of these tools. (OK, wrong if you consider
"dd" under Unix to be a tool....)

Your own example serves to make my point: the set of admins, naive or
otherwise, who stand a chance of disaster recovery with raw filesystems
in the 21st century is an order of magnitude less than those who stand a
chance of disaster recovery with "normal" filesystems.

And just to put the skin on the pudding, isn't the stated direction of
MSFT to turn the entire filesystem into a database? Whither raw
partitions then?

--Mark Bole

Nov 12 '05 #24
Mark Bole wrote:

Gawd, this cross-posting is a little scary...
Not wrong :) but we're being very well-behaved so far. I did see one slur
against Sybase's market position, but they've been patient and I thank them
for that. We still love you, folks... I was a big fan of Watcom 15 years
ago, and i believe it's the spiritual ancestor of Sybase and MS? Sybase
split from the Evil Empire, so that's a point in their favour, and in some
simple play with MS SQL, it looks like fun... Has that buttered up enough
people to keep things happy?
I've touched DB2,
administered Sybase for four years in the mid-nineties, otherwise
Oracle is it... so take the following for whatever it's worth.

To answer the question above: no, actually my experiences with Oracle
raw devices were under DEC Ultrix, SGI Irix, and HP-UX. I've only
managed "normal" filesystems under Solaris, Linux, and Windows. To
this day the thought of messing with the /dev filesystem stresses me
out... --)
True. If you MUST move a lump of data, which the engine thinks is called
/dev/rdk0101, then there's no way that can be renamed on many O/S due to
naming conventions. I got stuck with that several years ago. I actually
cheated and broke the O/S naming convention just to get things working. That
was an object lesson. However, common wisdom in the Informix world (probably
discovered 1000 times by different people) is to use symbolic links.
Personally I use something like /DBCHUNKS/enginename/lump01 -> /dev/blahblah
and then the engine can always use the logical name; the physical space will
be wherever it's pointed.
Yes, I've had to recover production (Oracle, Sybase) systems in real
life disaster situations: Score: raw filesystems, W-1, L-1. cooked
filesystems, W-2, L-0. So you can see where I'm heading...
OK - that's a warning from experience on them engines. OP take note. We have
a bunch of Oracle admins in this company, and I don't believe they use raw
spaces either. I don't know how they do backups either; I really should lurn
from them, but never quite get 'round to it.
I use "adminstration" in a very tool-oriented sense. With cooked file
systems (which I am defining as filesystems the OS is designed for...
tautology or no...), there are lots of tools: tools for timestamps,
tools for sizes, tools for checksums, tools for copying, tools for
finding, tools for checking open handles, and so on. With raw
filesystems, I have none of these tools. (OK, wrong if you consider
"dd" under Unix to be a tool....)
ok - I've heard that Oracle has a "variety" of backup procedures, some of
them home-cooked, some of them tools you have to pay for. In that case, you
would need suitable tools to manage the objects. With Informix, it's shipped
with one (now two) tools that perform complete, live, point-in-time archives
along with continuous storage of logical logs. Any timestamps etc that need
touching are self-contained within the engine spaces, so it's completely
unnecessary and almost certainly destructive to do anything with informix
spaces apart from via the utilities provided.

By the way, you can still perform something like

cksum < /dev/rawspace

or

dd if=/dev/rawspace bs=64k count=XXXk | cksum

and so on. Raw devices on unix are still just files. They are just not
appropriate to use without some smart access algorithms, such as might be
found in database engines... Sure, dd is a wierd utility that doesn't match
typical unix command line syntax, but it can still move "files" around. Not
that I ever need to do that.

Haven't heard any opinions from a DB/2 bod yet.
Your own example serves to make my point: the set of admins, naive or
otherwise, who stand a chance of disaster recovery with raw
filesystems in the 21st century is an order of magnitude less than
those who stand a chance of disaster recovery with "normal"
filesystems.
By the same token, a dumb-**** naive sysadmin could easily destroy cooked
files in their desperate and thoughtless search for more space. An
inexperienced but over-confident admin is a dangerous creature no matter
what product they are near. I don't think anyone could do a recovery using
the cleaning tapes I mentioned in another reply. Murphy's law at work there.
And just to put the skin on the pudding, isn't the stated direction of
MSFT to turn the entire filesystem into a database? Whither raw
partitions then?


MSFT? ?? Micro Soft Something Something? Not interested !:) UNIX and Linux
suits me, our application and our customers well. Good luck to Microsoft
with their FT, whatever that might be.

I'm waiting for any Informix users to contradict me about always using raw
with Informix. When the subject comes up over in c.d.i, we get 3
counter-responses. One is the "complicated" argument. One is about certain
platforms which contain system calls and filesystems which are specifically
tailored for write-through and no buffering. I've seen Solaris has a
filesystem option to apply that on the entire file system. On such a
machine, you'd only have to ensure contiguity, and to keep other peoples
dirty hands out of your spaces. The third counter argument to raw is large
high performance storage boxes - good luck to you if your customers or site
can afford them. I think a few of our customers should be using them, but
aren't.

I think a history and principles of raw space on UNIX is worth posting, so
the OP can make his/her own decisions... [looks - ok, him, Jim] That will
have to wait a few hours, and be yet another oversized posting :-) There are
many interesting points and developments along the way, not to mention big
fast storage boxes.
Nov 12 '05 #25
Andrew Hamm wrote:

[snip]
ok - I've heard that Oracle has a "variety" of backup procedures, some of
them home-cooked, some of them tools you have to pay for. In that case, you
would need suitable tools to manage the objects. With Informix, it's shipped
with one (now two) tools that perform complete, live, point-in-time archives
along with continuous storage of logical logs. Any timestamps etc that need
touching are self-contained within the engine spaces, so it's completely
unnecessary and almost certainly destructive to do anything with informix
spaces apart from via the utilities provided.


[snip]

For many aeons, it has been this on Oracle. Shell-scripted backups,
with commands provided by the database to make the O/S-produced backup
recoverable and usable. But since version 8.0 (and we've had 8.1.5,
8.1.6, 8.1.7, 9.0.1, 9.2, and 10g since then), Oracle has provided RMAN
-for free- which does cleverer and safer backups than any shell script
could manage. It's a command-line tool for the scripting addicts, but
has a GUI front-end for those so inclined. Works very nicely.

The emphasis is very much on using RMAN these days (when first released
it was a bit rough round the edges!), and O/S-based backups are
gradually becoming a thing of the past, or at least not too well looked
upon, generally (though it's nice to have the choice).

Point is, given the context of this discussion, RMAN works just as well
with raw devices as it does with file systems, and will happily backup a
raw-based database onto a file system, or vice versa.

Raw isn't the utterly inflexible nightmare in Oracle it's sometimes made
out to be.

Regards
HJR
Nov 12 '05 #26
"Jim Smith" <ji*@jimsmith.demon.co.uk> wrote in message
news:hq**************@jimsmith.demon.co.uk...

Is it faster I/O requests?
Or faster I/O overall?
Or less CPU used?

None of the above. A faster response and or throughput from a user point
of view.


Nope. That has nothing to do with I/O speed.
eminently STUPID to claim "file system I/O" is faster: there is
no I/O in that case, just in-memory access!


But it will give a faster response therefore "file systems are faster".


Absolutely not. Cache access is faster. And it has nothing to do
with fs or raw I/O. You get EXACTLY the same speed regardless
of where you got the data from.

But, database blocks will often be buffered in the db buffer cache so
the file system buffer may be irrelevant or even an overhead. I vaguely
remember on VMS it was recommended you disable disk caching and used
oracle's buffer cache.


Depends on what the OS can do. Some don't work all that well at
doing direct I/O to/from buffer cache. And if they copy from another
buffer, you end up with CPU use instead of I/O use.

--
Cheers
Nuno Souto
wi*******@yahoo.com.au.nospam

Nov 12 '05 #27
Sorry, I meant "to/from db buffer cache".

--
Cheers
Nuno Souto
wi*******@yahoo.com.au.nospam
"Noons" <wi*******@yahoo.com.au> wrote in message
news:40***********************@news.optusnet.com.a u...
Depends on what the OS can do. Some don't work all that well at
doing direct I/O to/from buffer cache.


Nov 12 '05 #28
[cutting]
eminently STUPID to claim "file system I/O" is faster: there is
no I/O in that case, just in-memory access!


But it will give a faster response therefore "file systems are faster".


Absolutely not. Cache access is faster. And it has nothing to do
with fs or raw I/O. You get EXACTLY the same speed regardless
of where you got the data from.

[cutting]

But if the cache is too small or being turned over very quickly the
cache will be slower 'cos you to copy from disk to cache and then
copy from cache to the app.

Interestingly on the bigger Sun servers the absolute bandwidth to
disk is significantly larger than the bandwidth to memory so, in
theory, you can the data from disk faster than from memory - I remain
unconvinced:-)
--
Paul Watson #
Oninit Ltd # Growing old is mandatory
Tel: +44 1436 672201 # Growing up is optional
Fax: +44 1436 678693 #
Mob: +44 7818 003457 #
www.oninit.com #
Nov 12 '05 #29
From what I remember of Oracle's "basic" backup tool (and I haven't
had much to do with it since 8.1.6) you have to put tablespaces
(equivalent to our dbspaces) into "backup mode" before you can back
them up which means they can't be written to. It's also typical to do
all this one tablespace at a time. If my memory is serving me
correctly (and I'm quite open to being told it isn't) this is a vastly
more primitive mechanism than either ontape or onbar.

Has the basic backup tool moved on since then, and does RMAN avoid any
such limitations?

(Sorry to drift off-topic but it's always useful to know what else is
out there. Hell, I might need a job with that stuff some day.)

Andy
"Howard J. Rogers" <hj*@dizwell.com> wrote in message news:<40***********************@news.optusnet.com. au>...
Andrew Hamm wrote:

[snip]
ok - I've heard that Oracle has a "variety" of backup procedures, some of
them home-cooked, some of them tools you have to pay for. In that case, you
would need suitable tools to manage the objects. With Informix, it's shipped
with one (now two) tools that perform complete, live, point-in-time archives
along with continuous storage of logical logs. Any timestamps etc that need
touching are self-contained within the engine spaces, so it's completely
unnecessary and almost certainly destructive to do anything with informix
spaces apart from via the utilities provided.


[snip]

For many aeons, it has been this on Oracle. Shell-scripted backups,
with commands provided by the database to make the O/S-produced backup
recoverable and usable. But since version 8.0 (and we've had 8.1.5,
8.1.6, 8.1.7, 9.0.1, 9.2, and 10g since then), Oracle has provided RMAN
-for free- which does cleverer and safer backups than any shell script
could manage. It's a command-line tool for the scripting addicts, but
has a GUI front-end for those so inclined. Works very nicely.

The emphasis is very much on using RMAN these days (when first released
it was a bit rough round the edges!), and O/S-based backups are
gradually becoming a thing of the past, or at least not too well looked
upon, generally (though it's nice to have the choice).

Point is, given the context of this discussion, RMAN works just as well
with raw devices as it does with file systems, and will happily backup a
raw-based database onto a file system, or vice versa.

Raw isn't the utterly inflexible nightmare in Oracle it's sometimes made
out to be.

Regards
HJR

Nov 12 '05 #30
> From what I remember of Oracle's "basic" backup tool (and I haven't
had much to do with it since 8.1.6) you have to put tablespaces
(equivalent to our dbspaces) into "backup mode" before you can back
them up which means they can't be written to. It's also typical to do
all this one tablespace at a time. If my memory is serving me
correctly (and I'm quite open to being told it isn't) this is a vastly
more primitive mechanism than either ontape or onbar.

Has the basic backup tool moved on since then, and does RMAN avoid any
such limitations?


RMAN does not have this limitation. You do not have to put any
tablespaces in backup mode for RMAN to perform a hot backup.
HTH,
Brian

--
================================================== =================

Brian Peasland
dba@remove_spam.peasland.com

Remove the "remove_spam." from the email address to email me.
"I can give it to you cheap, quick, and good. Now pick two out of
the three"
Nov 12 '05 #31
Brian Peasland wrote:
From what I remember of Oracle's "basic" backup tool (and I haven't
had much to do with it since 8.1.6) you have to put tablespaces
(equivalent to our dbspaces) into "backup mode" before you can back
them up which means they can't be written to. It's also typical to do
all this one tablespace at a time. If my memory is serving me
correctly (and I'm quite open to being told it isn't) this is a vastly
more primitive mechanism than either ontape or onbar.

Has the basic backup tool moved on since then, and does RMAN avoid any
such limitations?

RMAN does not have this limitation. You do not have to put any
tablespaces in backup mode for RMAN to perform a hot backup.
HTH,
Brian


And in fact, pretty much nothing in Andy's memory of Oracle 8.1.6 is
actually correct. RMAN was available with 8.1.6, tablespaces did not
need to be placed in backup mode for a hot backup, and even if you did
do it this was, they could still be written to. And I don't think that
RMAN could ever be described as more primitive than ontape, and
definately not onbar, especially if you do a comparison of capabilites
over the timeline.

And, just to be clear, the comment from Andrew Hamm is also incorrect
ok - I've heard that Oracle has a "variety" of backup procedures, some of
them home-cooked, some of them tools you have to pay for


RMAN is not a chargeable option with Oracle. It provided as part of the
base product, just as onbar is.

Nov 12 '05 #32
"Andy Kent" <an******************@virgin.net> wrote in message
news:ac*************************@posting.google.co m...
I wonder whether it makes the same difference on RAID.


Shouldn't do, because with RAID, and cacheing, the OS has no idea physically
where, or even actually whether, a write has been effected.
Nov 12 '05 #33

"Data Goob" <da******@hotmail.com> wrote in message
news:tk*******************@fe11.usenetserver.com.. .
Andrew, The real question is, how will you backup, much less restore
raw device databases? Are you prepared to deal with the
inflexibility it presents? Have you ever run Informix or
other vendor database through a complete backup and restore,
testing all the options with raw-device dbspaces? Most DBAs
I've run into have never had to restore a database at any
time in their career much less even test it. Is that amazing
or what!
Any of Oracle, Informix or DB2 will cope with this equally as well as if on
"cooked" disk. Informix will even let you restore a dbspace held on cooked
chunks to be restored to a raw disk (or vice versa). The others may do too
although I think there is more fannying around needed.

Finally you describe EMC Timefinder: provided you block the database
(onmode -b) for the few seconds the Timefinder split takes, you can then
spool the BCV copy out to tape and use this as the basis for an external
restore.
Consider too, clustering of systems and disks, and the
administrative challenges associated with that. Add raw
devices to multitudes of servers and it becomes more risky ...


If you don't know what you're doing .... :-)
Nov 12 '05 #34
Andy Kent wrote:
From what I remember of Oracle's "basic" backup tool (and I haven't
had much to do with it since 8.1.6) you have to put tablespaces
(equivalent to our dbspaces) into "backup mode" before you can back
them up which means they can't be written to.

That was never true, actually. Yes, if you were performing an O/S-based
backup, you had to say 'alter tablespace X begin backup', and '...end
backup' when you'd done. But in between those two statements, the
tablespace remains completely and utterly open for business, and is read
from and written to perfectly normally. It would be a sad backup
mechanism indeed if it suddenly rendered half your database read only!
It's also typical to do
all this one tablespace at a time. If my memory is serving me
correctly (and I'm quite open to being told it isn't) this is a vastly
more primitive mechanism than either ontape or onbar.
It is *sensible* to do one tablespace at a time, because by saying
'begin backup', you cause the first modification that takes place to a
block of data to write the entire block into what we would call the redo
logs (I think informix would call them the physical logs). Normally, the
same piece of DML would only write a few bytes of redo. Suddenly, in
backup mode, it is writing thousands of bytes. So to avoid your entire
database suddenly swamping the redo logs, yes... the sensible approach
is to do it one tablespace at a time.
Has the basic backup tool moved on since then,
The basic 'begin backup... copy... end backup' mechanism hasn't changed
*at all*.
and does RMAN avoid any
such limitations?
One of RMAN's *great* claims to fame is that, because it is an Oracle
tool that understands the concept of an Oracle block (whereas the O/S is
dumb, and only understands indiviudal O/S blocks), it "knows" when an
Oracle block is in flux. Instead of therefore writing the entire thing
to redo to make sure that there's a consistent image of the block, it
simply bides its time, and waits until the Oracle block is *not* in flux
before copying it. Therefore, there's no 'begin backup' or 'end backup'
command, no massive amounts of redo generated, and no need to do one
tablespace at a time.

And the command to make it happen used to be

run {
allocate channel d1 type disk;
backup database;
}

....but is now....

backup database;

Regards
HJR


(Sorry to drift off-topic but it's always useful to know what else is
out there. Hell, I might need a job with that stuff some day.)

Andy
"Howard J. Rogers" <hj*@dizwell.com> wrote in message news:<40***********************@news.optusnet.com. au>...
Andrew Hamm wrote:

[snip]

ok - I've heard that Oracle has a "variety" of backup procedures, some of
them home-cooked, some of them tools you have to pay for. In that case, you
would need suitable tools to manage the objects. With Informix, it's shipped
with one (now two) tools that perform complete, live, point-in-time archives
along with continuous storage of logical logs. Any timestamps etc that need
touching are self-contained within the engine spaces, so it's completely
unnecessary and almost certainly destructive to do anything with informix
spaces apart from via the utilities provided.


[snip]

For many aeons, it has been this on Oracle. Shell-scripted backups,
with commands provided by the database to make the O/S-produced backup
recoverable and usable. But since version 8.0 (and we've had 8.1.5,
8.1.6, 8.1.7, 9.0.1, 9.2, and 10g since then), Oracle has provided RMAN
-for free- which does cleverer and safer backups than any shell script
could manage. It's a command-line tool for the scripting addicts, but
has a GUI front-end for those so inclined. Works very nicely.

The emphasis is very much on using RMAN these days (when first released
it was a bit rough round the edges!), and O/S-based backups are
gradually becoming a thing of the past, or at least not too well looked
upon, generally (though it's nice to have the choice).

Point is, given the context of this discussion, RMAN works just as well
with raw devices as it does with file systems, and will happily backup a
raw-based database onto a file system, or vice versa.

Raw isn't the utterly inflexible nightmare in Oracle it's sometimes made
out to be.

Regards
HJR

Nov 12 '05 #35
"Andrew Hamm" <ah***@mail.com> wrote in message news:<c5************@ID-79573.news.uni-berlin.de>...
Mark Bole wrote:
I haven't worked with an Oracle raw device since version 7.3 seven
years ago, and would never go back. The administrative overhead is
just too much of a headache.
[Posting from comp.databases.informix]

I've never understood the "administrative overhead" argument against raw
spaces. Can you elicidate on the administration you think needs to be
applied? In my experience, the only admin overhead is making sure that a
new, naive sysop doesn't try to turn a raw space into a filesystem - I saved
an engine by SECONDS once, by looking over the shoulder of said sysop as I
walked past....


Funnily enough, that very reason is the one I've considered most
important for years:
http://groups.google.com/groups?selm...&output=gplain
( the "other hand" line has been superceded in more recent versions ).

I'd add that if you are that close to saturation where a small
percentage will make such a difference, you probably need some good
tuning and/or more hardware.

Apart from that, I find that admin is slightly simpler simply because you
don't need to run a mkfs style of command ;-)

I have heard that Solaris (?) is in the habit of re-creating /dev at boot
time, and I know that DG-UX used to do the same thing. Therefore some sort
of tool is needed to make sure the raw devs exist after a boot. Is this what
you mean?


jg
--
@home.com is bogus.
Please ignore this link. http://www.securityfocus.com/columnists/224
Nov 12 '05 #36
Mark Townsend wrote:

And, just to be clear, the comment from Andrew Hamm is also incorrect
ok - I've heard that Oracle has a "variety" of backup procedures,
some of them home-cooked, some of them tools you have to pay for


RMAN is not a chargeable option with Oracle. It provided as part of
the base product, just as onbar is.


yup - good info from Howard on this one, and a bit of history where this
opinion was based.
Nov 12 '05 #37
Joel Garry wrote:

Funnily enough, that very reason is the one I've considered most
important for years:
http://groups.google.com/groups?selm...&output=gplain ( the "other hand" line has been superceded in more recent versions ).
He-heh - 1996? that would be around the time my engine was under threat.
However, dangerous people can do damage to nearly everything. I've also seen
LOST engines from people either deleting or moving cooked files. You can
run, but you cannot hide.
I'd add that if you are that close to saturation where a small
percentage will make such a difference, you probably need some good
tuning and/or more hardware.


Wellllllll, no. NIMM (not in my mileage)

a 20% improvement is good - why not? It also improves throughput - 20% would
be Just About noticable too. But more importantly, it's during periods of
heavy sequentials that it kicks in. An overall benefit of 20% will probably
give you bursts of much bigger improvements. Like the (Informix) situations
I've mentioned such as creating spaces, but also things like light scans,
checkpoints (a major bugbear for some people) and of course big sequential
reads and builds.

Further (with Informix, once again) the engine can use KAIO, and with the
architecture of Informix, this can lead to further significant improvements.
It all adds up. Why do you think F1 now make their pedals out of carbon
fibre? And they *still* drill 'em out for extra lightness. An Informix
engine using raw with KAIO and a decent layout of spaces on the disk can
feel very spiffy indeed even compared to one that merely drops KAIO and raw.

Some UNIX platforms with Big Hairy storage boxes do provide device drivers
etc that support KAIO by either faking the raw device or just providing the
feature. What you do with that hardware depends on the machinery of course.
Nov 12 '05 #38
"Andrew Hamm" <ah***@mail.com> wrote in message
news:c5************@ID-79573.news.uni-berlin.de...
Further (with Informix, once again) the engine can use KAIO, and with the
architecture of Informix, this can lead to further significant improvements. It all adds up. Why do you think F1 now make their pedals out of carbon
fibre? And they *still* drill 'em out for extra lightness. An Informix
engine using raw with KAIO and a decent layout of spaces on the disk can
feel very spiffy indeed even compared to one that merely drops KAIO and

raw.

A dangerous analogy, the formula 1 one. An F1 Car is made to last a
weekend - used to be a race - and to be rebuilt after that. This is not a
desirable* thing in a database. Robustness and reliability are important as
well as performance - probably more so.
--
Niall Litchfield
Oracle DBA
Audit Commission UK
http://www.niall.litchfield.dial.pipex.com
*****************************************
Please include version and platform
and SQL where applicable
It makes life easier and increases the
likelihood of a good answer
******************************************

*actually it is desirable in one situation - when running a TPC or similar
benchmark. Again you engineer the product to last pretty much for the life
of the performance test and sacrifice everything else for speed.
Nov 12 '05 #39
> From what I remember of Oracle's "basic" backup tool (and I haven't
had much to do with it since 8.1.6) you have to put tablespaces
(equivalent to our dbspaces) into "backup mode" before you can back
them up which means they can't be written to. It's also typical to do


This is a myth.

Backup mode doesn't stall any writes to the tablespaces - you can perfectly
do all of the read/write operations on a tablespace in backup mode just as
on a tablesace not in backup mode. Oracle just may do some additional redo
logging if necessary to guarantee the consistency of the database.

This has worked such way since version 7, maybe even before...

Tanel.
Nov 12 '05 #40
Niall Litchfield wrote:
"Andrew Hamm" <ah***@mail.com> wrote in message
news:c5************@ID-79573.news.uni-berlin.de...
Further (with Informix, once again) the engine can use KAIO, and with the
architecture of Informix, this can lead to further significant

improvements.
It all adds up. Why do you think F1 now make their pedals out of carbon
fibre? And they *still* drill 'em out for extra lightness. An Informix
engine using raw with KAIO and a decent layout of spaces on the disk can
feel very spiffy indeed even compared to one that merely drops KAIO and

raw.

A dangerous analogy, the formula 1 one. An F1 Car is made to last a
weekend - used to be a race - and to be rebuilt after that. This is not a
desirable* thing in a database. Robustness and reliability are important
as well as performance - probably more so.


I am thinking extra performance for no loss relaibility is also good, no?
KAIO not make database less relaible.

--
Enor
Nov 12 '05 #41
"Paul Watson" <pa**@oninit.com> wrote in message
news:40***************@oninit.com...
Absolutely not. Cache access is faster. And it has nothing to do
with fs or raw I/O. You get EXACTLY the same speed regardless
of where you got the data from. [cutting]

But if the cache is too small or being turned over very quickly the
cache will be slower 'cos you to copy from disk to cache and then
copy from cache to the app.


Disagree. The cache access from a given db process will still be at the same
speed: it's a memory-to-memory copy, the cache size means nothing in that
context.

Of course, the db processes MAY have to wait for real I/O to fill up the cache.
But that doesn't mean "the cache is slower".
Interestingly on the bigger Sun servers the absolute bandwidth to
disk is significantly larger than the bandwidth to memory so, in
theory, you can the data from disk faster than from memory - I remain
unconvinced:-)


Yup! :)
I'd guess what they mean is the overall *aggregate* I/O bandwidth is
faster than memory access speed. This because in some of the 64 CPU boxes,
it may actually be quite slower for a given CPU to access memory belonging
to another CPU quad card. While the I/O speed stays the same as it goes
directly to each quad CPU/memory card as requested.

Still, a strange claim by any standard. I've worked with a 64CPU ES10K Sun
and its I/O speed for single disk access was nothing to write home about...

--
Cheers
Nuno Souto
wi*******@yahoo.com.au.nospam

Nov 12 '05 #42
"Niall Litchfield" <ni**************@dial.pipex.com> wrote in message news:<40***********************@news-text.dial.pipex.com>...
"Andrew Hamm" <ah***@mail.com> wrote in message
news:c5************@ID-79573.news.uni-berlin.de...
Further (with Informix, once again) the engine can use KAIO, and with the
architecture of Informix, this can lead to further significant improvements.
It all adds up. Why do you think F1 now make their pedals out of carbon
fibre? And they *still* drill 'em out for extra lightness. An Informix
engine using raw with KAIO and a decent layout of spaces on the disk can
feel very spiffy indeed even compared to one that merely drops KAIO and

raw.

A dangerous analogy, the formula 1 one. An F1 Car is made to last a
weekend - used to be a race - and to be rebuilt after that. This is not a
desirable* thing in a database. Robustness and reliability are important as
well as performance - probably more so.


Good point, Niall (especially the *). I think the better analogy
might be fuel injection v. carburetors. Better performance AND gas
mileage AND reliability eventually takes over the consumer market and
the racing market (with some notable exceptions). So if raw is a
nearly-free 20% performance gain, especially during month-end serial
processing when people with purse strings are tapping their fingers on
their desks, then it's likely to take over. And in a way it is, with
the newfangled filesystems. But managing raw filesystems manually is
like mulitple carburetors - great performance if you have the right
tools, risky if you don't. Ever hear a six-carb V12? Hooo-mama.


--
Niall Litchfield
Oracle DBA
Audit Commission UK
http://www.niall.litchfield.dial.pipex.com
*****************************************
Please include version and platform
and SQL where applicable
It makes life easier and increases the
likelihood of a good answer
******************************************

*actually it is desirable in one situation - when running a TPC or similar
benchmark. Again you engineer the product to last pretty much for the life
of the performance test and sacrifice everything else for speed.


jg
--
@home.com is bogus.
http://www.forbes.com/2002/08/13/0813vow.html
Nov 12 '05 #43

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.