Hey all -
We are running SQL 2000 with ALL available service packs, etc.
applied. We just built a brand new database server, which has dual
2Ghz XEONs, 2GB memory, and the following disk configuration:
RAID 1 array (2 disks) Operating System Windows Server 2003
RAID 1 array (2 disks) Database Logs
RAID 10 array (4 disks) Database Data
Disks are SATA, with a 3Ware hardware RAID controller. The machine
SCREAMS.
We run 5 databases on this machine. 2 of these are fairly large (by
our standards, anyway). The second largest database (and the busiest
and most important) is consistently generating consistency errors that
bring many important queries down. These are almost ALWAYS in the
form of index corruption on one single table. The corruption does not
normally occur on other tables, although it DOES happen once in a
while - rarely - on one of the other tables), nor does it EVER occur
on any other databases on the server.
The corruption seems to happen right in the neighborhood of midnight
ALMOST every day, give or take a few minutes, but does not seem
directly associated with any of our MANY scheduled database cleanup
tasks (believe me, we've tried desperately to find an association
using SQL profiler). At midnight, our database traffic is fairly low,
so it does not seem associated with a high traffic level.
We are using the FULL recovery model, with log backups every 15
minutes, and full backups daily at 12:15am. However, the corruption
happens consistently BEFORE 12:15, like between 11:50pm and 12:10am.
The most frustrating thing is, the database can go WEEKS without any
corruption at all, and then it'll go 4 or 5 days in a row with this
strange corruption stuff.
************************************************** ***********************
Typical query errors when the corruption exists include:
************************************************** ***********************
SQL Server Assertion: File:
<p:\sql\ntdbms\storeng\drs\include\record.inl>, line=1447
Failed Assertion = 'm_SizeRec > 0 && m_SizeRec <= MAXDATAROW'.
SQL Server Assertion: File: <recbase.cpp>, line=1378
Failed Assertion = 'm_offBeginVar < m_SizeRec'.
Server: Msg 3624, Level 20, State 1, Line 7
Location: recbase.cpp:1374
Expression: m_nVars > 0
Connection Broken
************************************************** ***********************
Most of the responses to this type of issue (failed assertions) on the
newgroups appear to point to hardware failures. However, this is
brand new hardware, AND, it seems to us that if this was a hardware
issue, other databases, tables, and indexes would be affected
randomly. Isn't that a valid assumption (that if it was hardware,
particularly the RAID controller, the corruption would not be in such
a predictable place)? What if we moved the physical database files to
another location on the disk? Would/could that help?
If anyone could offer some suggestions as to what may be causing this
corruption, we would be eternally grateful. It is getting to be a
real pain in the A*** to run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS
every day or two (it always seems to solve the problem without data
loss, but still...).
Again, thanks in advance for your response.
Sincerely,
Morgan Leppink
ml******@hotmail.com