473,405 Members | 2,445 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Raw Device Wiggle Room?

I'm doing a postmortem from an outage at my workplace that looks too
similar to an outage we had last fall to not be related. Both database
outages had the following characteristics:

1) VERY large, frequently-accessed & updated tablespace defined on a
raw device of about 348GIG, with the indexes defined on a separate raw
device of a SUN machine.
2) Something happened to the device that made it start logging hardware
and I/O errors, which in turn marked the tablespace as bad.
3) There were no errors in the db2diag.log about the tablespace filling
up. The database just crashed one day and came back up with a mangled
tablespace.

While I've got the server ops dude checking out the hardware, I'm
curious about the device. 2 hardware failures under seemingly the same
set of conditions...I'm wondering............Is it possible to define a
tablespace on a raw device with a size parameter that is acceptable to
DB2 (therefore letting it create the tablespace successfully), but
somehow causes an issue with the OS later on? Meaning, if I had a
device that was 300GIG, could I define a tablespace on it for 300GIG,
or should I leave a little wiggle room?

Nov 12 '05 #1
5 1632

<te********@gmail.com> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com...
I'm doing a postmortem from an outage at my workplace that looks too
similar to an outage we had last fall to not be related. Both database
outages had the following characteristics:

1) VERY large, frequently-accessed & updated tablespace defined on a
raw device of about 348GIG, with the indexes defined on a separate raw
device of a SUN machine.
2) Something happened to the device that made it start logging hardware
and I/O errors, which in turn marked the tablespace as bad.
3) There were no errors in the db2diag.log about the tablespace filling
up. The database just crashed one day and came back up with a mangled
tablespace.

While I've got the server ops dude checking out the hardware, I'm
curious about the device. 2 hardware failures under seemingly the same
set of conditions...I'm wondering............Is it possible to define a
tablespace on a raw device with a size parameter that is acceptable to
DB2 (therefore letting it create the tablespace successfully), but
somehow causes an issue with the OS later on? Meaning, if I had a
device that was 300GIG, could I define a tablespace on it for 300GIG,
or should I leave a little wiggle room?


When you create a tablespace on a raw device, DB2 will check to see if it
can access the "end" of the device, presumably to protect against this kind
of problem. So as long as DB2 does not complain when creating the
tablespace, you will be fine.

Granted, both the OS and DB2 have some overhead, so a 300GB physical device
may only have (300GB - xx KB) in usuable space, and once a DB2 container is
created on that device, there will only be (300GB - xx KB - yy KB) available
to use for pages.

--
Matt Emmerton
Nov 12 '05 #2
That makes sense to me. The nature of a DMS container is to claim all
the space you have defined for it up front and let DB2 manage the
space. Still, it's very strange that two of our servers failed in the
same way on two different physical machines. I guess those poor old
RAID 0 array's just gave out from all the activity.

Nov 12 '05 #3
RAID 0? That's just "marketing" for ***NO*** RAID. In simple language,
you're asking for disaster, and you apparently got what you asked for.

"TechWitch" <te********@gmail.com> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...
That makes sense to me. The nature of a DMS container is to claim all
the space you have defined for it up front and let DB2 manage the
space. Still, it's very strange that two of our servers failed in the
same way on two different physical machines. I guess those poor old
RAID 0 array's just gave out from all the activity.

Nov 12 '05 #4
Well, unfortunately, I did not have any say in the matter. I'm just
the one who they run screaming to when it fails. < shakes head > This
has happened to them twice now over the past year...you think they'd
learn their lesson by now. I can't use words to describe how
FRUSTRATING it is when you try to explain to people that DBMS software
does NOT cause hardware failures and don't believe you. It's the other
way around!!! ( sorry had to vent there. )

FORTUNATELY, we were able to recover some summary tables on a different
device. That satisfied management for the time being.

< sigh >

TW

Nov 12 '05 #5
I sure hope your backups and logs are on a different physical device
from the tablespaces. Your management needs to become educated in RAID
terminology and architecture so they can properly assess the business
consequences of using disk configurations optimized for speed instead of
reliability.

Phil Sherman
TechWitch wrote:
That makes sense to me. The nature of a DMS container is to claim all
the space you have defined for it up front and let DB2 manage the
space. Still, it's very strange that two of our servers failed in the
same way on two different physical machines. I guess those poor old
RAID 0 array's just gave out from all the activity.

Nov 12 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Dave Robinson | last post by:
I was wondering if anyone could help me with a problem I'm having. I've been using Dreamweaver to create a hotel booking system for a friend of mine, using MySQL (version 4.0.21) and PHP 5. The...
27
by: Aurangzeb M. Agha | last post by:
I'm running Postgres 7.1.3, and just started having a problem where my dynamic site is going down (read-only DB, with no writes happening to the DB) regularly (every other day). I have no idea...
2
by: Andy | last post by:
Hi folks I teach. At school, four IT rooms are booked using a paper based outline timetable. Completing it is easy but basic and impossible to ensure completion of all fields (name, year...
8
by: Tony Liu | last post by:
I am having a "Null Device is Missing" compile error when compiling a c++ project. The documentation from MSDN said it could be caused by low system resource or the user account does not have...
1
by: DotNetNewbie | last post by:
Does anyone know of any electronic security device hardware for doors that can be accessed using vb.net? Like for instance a room that needs to be secured via an electronic device, that can open...
5
by: Steven Blair | last post by:
My problem is trying to calcuate whether a room is booked during a date period. I have a table with two columns (Start and End date). I need some SQL code to calculate whether a room is booked...
1
by: ALIABBAS J PETIWALA | last post by:
N -ROOM LIGHTS PROBLEM ======================== THERE IS A BIG SQURE ROOM OF SIDE N WHICH CONSISTS OF N X N SMALLER SQUARE ROOMS(ARRANGED LIKE CHESS BOARD) EACH ROOM HAS A LIGHT. WHEN the...
1
by: Tom Brown | last post by:
Hi, I have a windows application, written in delphi, that communicates to our devices using raw ethernet frames. I am trying to port this application to linux using python. However, when I try...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.