I'm doing a postmortem from an outage at my workplace that looks too
similar to an outage we had last fall to not be related. Both database
outages had the following characteristics:
1) VERY large, frequently-accessed & updated tablespace defined on a
raw device of about 348GIG, with the indexes defined on a separate raw
device of a SUN machine.
2) Something happened to the device that made it start logging hardware
and I/O errors, which in turn marked the tablespace as bad.
3) There were no errors in the db2diag.log about the tablespace filling
up. The database just crashed one day and came back up with a mangled
tablespace.
While I've got the server ops dude checking out the hardware, I'm
curious about the device. 2 hardware failures under seemingly the same
set of conditions...I'm wondering............Is it possible to define a
tablespace on a raw device with a size parameter that is acceptable to
DB2 (therefore letting it create the tablespace successfully), but
somehow causes an issue with the OS later on? Meaning, if I had a
device that was 300GIG, could I define a tablespace on it for 300GIG,
or should I leave a little wiggle room?