Connecting Tech Pros Worldwide Forums | Help | Site Map

DBCC and Failed Assertion Errors - HELP!

Morgan Leppink
Guest
 
Posts: n/a
#1: Jul 20 '05
Hey all -

We are running SQL 2000 with ALL available service packs, etc.
applied. We just built a brand new database server, which has dual
2Ghz XEONs, 2GB memory, and the following disk configuration:

RAID 1 array (2 disks) Operating System Windows Server 2003
RAID 1 array (2 disks) Database Logs
RAID 10 array (4 disks) Database Data

Disks are SATA, with a 3Ware hardware RAID controller. The machine
SCREAMS.

We run 5 databases on this machine. 2 of these are fairly large (by
our standards, anyway). The second largest database (and the busiest
and most important) is consistently generating consistency errors that
bring many important queries down. These are almost ALWAYS in the
form of index corruption on one single table. The corruption does not
normally occur on other tables, although it DOES happen once in a
while - rarely - on one of the other tables), nor does it EVER occur
on any other databases on the server.

The corruption seems to happen right in the neighborhood of midnight
ALMOST every day, give or take a few minutes, but does not seem
directly associated with any of our MANY scheduled database cleanup
tasks (believe me, we've tried desperately to find an association
using SQL profiler). At midnight, our database traffic is fairly low,
so it does not seem associated with a high traffic level.

We are using the FULL recovery model, with log backups every 15
minutes, and full backups daily at 12:15am. However, the corruption
happens consistently BEFORE 12:15, like between 11:50pm and 12:10am.
The most frustrating thing is, the database can go WEEKS without any
corruption at all, and then it'll go 4 or 5 days in a row with this
strange corruption stuff.

************************************************** ***********************
Typical query errors when the corruption exists include:
************************************************** ***********************

SQL Server Assertion: File:
<p:\sql\ntdbms\storeng\drs\include\record.inl>, line=1447
Failed Assertion = 'm_SizeRec > 0 && m_SizeRec <= MAXDATAROW'.


SQL Server Assertion: File: <recbase.cpp>, line=1378
Failed Assertion = 'm_offBeginVar < m_SizeRec'.


Server: Msg 3624, Level 20, State 1, Line 7
Location: recbase.cpp:1374
Expression: m_nVars > 0


Connection Broken

************************************************** ***********************

Most of the responses to this type of issue (failed assertions) on the
newgroups appear to point to hardware failures. However, this is
brand new hardware, AND, it seems to us that if this was a hardware
issue, other databases, tables, and indexes would be affected
randomly. Isn't that a valid assumption (that if it was hardware,
particularly the RAID controller, the corruption would not be in such
a predictable place)? What if we moved the physical database files to
another location on the disk? Would/could that help?

If anyone could offer some suggestions as to what may be causing this
corruption, we would be eternally grateful. It is getting to be a
real pain in the A*** to run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS
every day or two (it always seems to solve the problem without data
loss, but still...).

Again, thanks in advance for your response.


Sincerely,


Morgan Leppink
mleppink@hotmail.com

Paul S Randal [MS]
Guest
 
Posts: n/a
#2: Jul 20 '05

re: DBCC and Failed Assertion Errors - HELP!


Hi Morgan,

Have you actually checked the event logs and run hardware diagnostics on
your IO system to see if there are hardware problems?

If so and there's no clues there, you should call Product Support to help
you diagnose the problem.

Regards.

--
Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine

This posting is provided "AS IS" with no warranties, and confers no rights.

"Morgan Leppink" <mleppink@hotmail.com> wrote in message
news:806e6d7.0405271455.1bf6a2d4@posting.google.co m...[color=blue]
> Hey all -
>
> We are running SQL 2000 with ALL available service packs, etc.
> applied. We just built a brand new database server, which has dual
> 2Ghz XEONs, 2GB memory, and the following disk configuration:
>
> RAID 1 array (2 disks) Operating System Windows Server 2003
> RAID 1 array (2 disks) Database Logs
> RAID 10 array (4 disks) Database Data
>
> Disks are SATA, with a 3Ware hardware RAID controller. The machine
> SCREAMS.
>
> We run 5 databases on this machine. 2 of these are fairly large (by
> our standards, anyway). The second largest database (and the busiest
> and most important) is consistently generating consistency errors that
> bring many important queries down. These are almost ALWAYS in the
> form of index corruption on one single table. The corruption does not
> normally occur on other tables, although it DOES happen once in a
> while - rarely - on one of the other tables), nor does it EVER occur
> on any other databases on the server.
>
> The corruption seems to happen right in the neighborhood of midnight
> ALMOST every day, give or take a few minutes, but does not seem
> directly associated with any of our MANY scheduled database cleanup
> tasks (believe me, we've tried desperately to find an association
> using SQL profiler). At midnight, our database traffic is fairly low,
> so it does not seem associated with a high traffic level.
>
> We are using the FULL recovery model, with log backups every 15
> minutes, and full backups daily at 12:15am. However, the corruption
> happens consistently BEFORE 12:15, like between 11:50pm and 12:10am.
> The most frustrating thing is, the database can go WEEKS without any
> corruption at all, and then it'll go 4 or 5 days in a row with this
> strange corruption stuff.
>
> ************************************************** ***********************
> Typical query errors when the corruption exists include:
> ************************************************** ***********************
>
> SQL Server Assertion: File:
> <p:\sql\ntdbms\storeng\drs\include\record.inl>, line=1447
> Failed Assertion = 'm_SizeRec > 0 && m_SizeRec <= MAXDATAROW'.
>
>
> SQL Server Assertion: File: <recbase.cpp>, line=1378
> Failed Assertion = 'm_offBeginVar < m_SizeRec'.
>
>
> Server: Msg 3624, Level 20, State 1, Line 7
> Location: recbase.cpp:1374
> Expression: m_nVars > 0
>
>
> Connection Broken
>
> ************************************************** ***********************
>
> Most of the responses to this type of issue (failed assertions) on the
> newgroups appear to point to hardware failures. However, this is
> brand new hardware, AND, it seems to us that if this was a hardware
> issue, other databases, tables, and indexes would be affected
> randomly. Isn't that a valid assumption (that if it was hardware,
> particularly the RAID controller, the corruption would not be in such
> a predictable place)? What if we moved the physical database files to
> another location on the disk? Would/could that help?
>
> If anyone could offer some suggestions as to what may be causing this
> corruption, we would be eternally grateful. It is getting to be a
> real pain in the A*** to run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS
> every day or two (it always seems to solve the problem without data
> loss, but still...).
>
> Again, thanks in advance for your response.
>
>
> Sincerely,
>
>
> Morgan Leppink
> mleppink@hotmail.com[/color]


Morgan Leppink
Guest
 
Posts: n/a
#3: Jul 20 '05

re: DBCC and Failed Assertion Errors - HELP!


Paul -

The only information in the event logs is the text of the failed
assertion error itself. I have never seen any OS-reported problems
with the hardware.

I hate to seem stupid, but can you be more specific about what you
mean when you say "hardware diagnostics?" Are you talking about the
simple Windows CheckDisk utility or something more advanced? This is
the first time I've used a hardware RAID controller - is Windows even
capable of checking the hardware-controlled disk array, or do I need
to use a utility provided by the RAID controller manufacturer?

Or would you suggest some sort of third-party utility for "burning in"
the hardware? Would you suspect disk drives, memory, or what? Could
it be ANY of the hradware, or just specific things?

One last question: What's the most effective method for contacting
product support if I need to do so?

Thanks,

Morgan Leppink


"Paul S Randal [MS]" <prandal@online.microsoft.com> wrote in message news:<40b684c7$1@news.microsoft.com>...[color=blue]
> Hi Morgan,
>
> Have you actually checked the event logs and run hardware diagnostics on
> your IO system to see if there are hardware problems?
>
> If so and there's no clues there, you should call Product Support to help
> you diagnose the problem.
>
> Regards.
>
> --
> Paul Randal
> Dev Lead, Microsoft SQL Server Storage Engine
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
> "Morgan Leppink" <mleppink@hotmail.com> wrote in message
> news:806e6d7.0405271455.1bf6a2d4@posting.google.co m...[color=green]
> > Hey all -
> >
> > We are running SQL 2000 with ALL available service packs, etc.
> > applied. We just built a brand new database server, which has dual
> > 2Ghz XEONs, 2GB memory, and the following disk configuration:
> >
> > RAID 1 array (2 disks) Operating System Windows Server 2003
> > RAID 1 array (2 disks) Database Logs
> > RAID 10 array (4 disks) Database Data
> >
> > Disks are SATA, with a 3Ware hardware RAID controller. The machine
> > SCREAMS.
> >
> > We run 5 databases on this machine. 2 of these are fairly large (by
> > our standards, anyway). The second largest database (and the busiest
> > and most important) is consistently generating consistency errors that
> > bring many important queries down. These are almost ALWAYS in the
> > form of index corruption on one single table. The corruption does not
> > normally occur on other tables, although it DOES happen once in a
> > while - rarely - on one of the other tables), nor does it EVER occur
> > on any other databases on the server.
> >
> > The corruption seems to happen right in the neighborhood of midnight
> > ALMOST every day, give or take a few minutes, but does not seem
> > directly associated with any of our MANY scheduled database cleanup
> > tasks (believe me, we've tried desperately to find an association
> > using SQL profiler). At midnight, our database traffic is fairly low,
> > so it does not seem associated with a high traffic level.
> >
> > We are using the FULL recovery model, with log backups every 15
> > minutes, and full backups daily at 12:15am. However, the corruption
> > happens consistently BEFORE 12:15, like between 11:50pm and 12:10am.
> > The most frustrating thing is, the database can go WEEKS without any
> > corruption at all, and then it'll go 4 or 5 days in a row with this
> > strange corruption stuff.
> >
> > ************************************************** ***********************
> > Typical query errors when the corruption exists include:
> > ************************************************** ***********************
> >
> > SQL Server Assertion: File:
> > <p:\sql\ntdbms\storeng\drs\include\record.inl>, line=1447
> > Failed Assertion = 'm_SizeRec > 0 && m_SizeRec <= MAXDATAROW'.
> >
> >
> > SQL Server Assertion: File: <recbase.cpp>, line=1378
> > Failed Assertion = 'm_offBeginVar < m_SizeRec'.
> >
> >
> > Server: Msg 3624, Level 20, State 1, Line 7
> > Location: recbase.cpp:1374
> > Expression: m_nVars > 0
> >
> >
> > Connection Broken
> >
> > ************************************************** ***********************
> >
> > Most of the responses to this type of issue (failed assertions) on the
> > newgroups appear to point to hardware failures. However, this is
> > brand new hardware, AND, it seems to us that if this was a hardware
> > issue, other databases, tables, and indexes would be affected
> > randomly. Isn't that a valid assumption (that if it was hardware,
> > particularly the RAID controller, the corruption would not be in such
> > a predictable place)? What if we moved the physical database files to
> > another location on the disk? Would/could that help?
> >
> > If anyone could offer some suggestions as to what may be causing this
> > corruption, we would be eternally grateful. It is getting to be a
> > real pain in the A*** to run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS
> > every day or two (it always seems to solve the problem without data
> > loss, but still...).
> >
> > Again, thanks in advance for your response.
> >
> >
> > Sincerely,
> >
> >
> > Morgan Leppink
> > mleppink@hotmail.com[/color][/color]
druss
Guest
 
Posts: n/a
#4: Jul 20 '05

re: DBCC and Failed Assertion Errors - HELP!


I am running a 3ware SATA Raid card also and have been getting consistency
errors randomly also. I have to run repair_allow_data_loss to fix. I wish
I knew the cause. No drive errors. Microsoft can not pin point either. All
they can tell me is that it is most likely hardware related and to move my
database to another server.

Greg D. Moore \(Strider\)
Guest
 
Posts: n/a
#5: Jul 20 '05

re: DBCC and Failed Assertion Errors - HELP!



"druss" <dean@corp.dslextreme.com> wrote in message
news:6c23824f8890833bc7e6bd07c5331636@localhost.ta lkaboutdatabases.com...[color=blue]
> I am running a 3ware SATA Raid card also and have been getting consistency
> errors randomly also. I have to run repair_allow_data_loss to fix. I wish
> I knew the cause. No drive errors. Microsoft can not pin point either. All
> they can tell me is that it is most likely hardware related and to move my
> database to another server.[/color]

I would suggest they're probably right in this case.

[color=blue]
>[/color]


Closed Thread