HELP - SQL Server Crash ? Memory leak ?

"Thue Tuxen Sørensen" <tu***@esynergy.dk> wrote in message
news:3f**********************@dread14.news.tele.dk ...

Hi everybody !

I´m maintaining a large intranet (approx 10000 concurrent users) running on one IIS box and one DB box with sqlserver 2000.

Currently there is 2,5 GB Ram, 1 1400 mhz cpu and 2 scsi disks installed on the db box.
Sqlserver is set to use max 1,4 GB RAM, and the sqlserver does not seem to
be using it all.

Currently SQLSERVER 2000 crashes at least once a day.
Its very weird, I run performance monitor with counters on, memory, disk
usage, num users, locks and such.

There is no indications in the counters before the crashes, they just happen very sudden.
Only indication is that sqlserver makes some huge jumps in memory usage and mostly the sqlserver then crashes an hour or 2 later.

The only thing that peaks a lot are the locks/sec counter.

My analysis of disk usage, queues etc. tells me i got no kind of i/o
bottlenecks.

Can anybody give me a clue as to what i should do ?
First, make sure you've applied all the latest service packs.

Also, look at the most recent errorlog after a crash (errorlog.1 most
likely). It should have a dump of what was going on.

That might give you a clue.

Also check your event log for anything.

Finally, if this doesn't turn up anything, call Microsoft.

SQL Server does not normally crash. I have some boxes that ran for more
than a year before we had to reboot them due to a physical move.

Best regards, Thue

Jul 20 '05 #2

mountain man

1. Check the server logs for any informative error messages.

"Thue Tuxen Sørensen" <tu***@esynergy.dk> wrote in message
news:3f**********************@dread14.news.tele.dk ...

Hi everybody !

I´m maintaining a large intranet (approx 10000 concurrent users) running on one IIS box and one DB box with sqlserver 2000.

Currently there is 2,5 GB Ram, 1 1400 mhz cpu and 2 scsi disks installed on the db box.
Sqlserver is set to use max 1,4 GB RAM, and the sqlserver does not seem to
be using it all.

Currently SQLSERVER 2000 crashes at least once a day.
Its very weird, I run performance monitor with counters on, memory, disk
usage, num users, locks and such.

There is no indications in the counters before the crashes, they just happen very sudden.
Only indication is that sqlserver makes some huge jumps in memory usage and mostly the sqlserver then crashes an hour or 2 later.

The only thing that peaks a lot are the locks/sec counter.

My analysis of disk usage, queues etc. tells me i got no kind of i/o
bottlenecks.

Can anybody give me a clue as to what i should do ?

Best regards, Thue

Jul 20 '05 #3

[posted and mailed, vänligen svara i nys]

Thue Tuxen Sørensen (tu***@esynergy.dk) writes:

Only indication is that sqlserver makes some huge jumps in memory usage
and mostly the sqlserver then crashes an hour or 2 later.

The fact that the memory usage of SQL Server jumps, is perfectly normal,
and is only a sign of that someone is using the application.

By default, SQL Server grabs as much memory it can. This is because the
bigger the cache SQL Server can have, the better will the response time
be.

Possible causes for SQL Server crashes:

* Bug in SQL Server, provoked by some SQL statement.
* Access violation in a extended procedure or OLE object that is called
by SQL Server from application code.
* Hardware problems.

The error log for SQL Server should give information about the case.

If I were you, I would investigate the second point before I opened a
case with Microsoft, because this is the most likely reason.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #4

Thue Tuxen Sørensen

Thanks for all the answers !

Its running with SP3.

I think I explained the crashes a bit wrong before maybe ...
What I mean is that the sqlserver suddenly 'hangs' and that its impossible
to communicate with in any way.
The performance monitor also stops getting input and just freezes.

The only way to get the site up running again is to restart the sqlserver
service (not the server).
Theres no indication in the errorlogs, as to what happens just before the
'crash'.
Ive looked through all of them, to see if any of them had some info I could
use.

All errorlogs begin with info regarding the startup of sqlserver
initialising the listener and starting up the db´s and such.
After all the info regarding the startup there is nothing in the log.

The next piece of info in the log is the entry where it writes that
sqlserver is terminating due to 'stop' request from service control manager.

And the stop request is issued by me after the system has crashed / is
hanging.

The event viewer is also not helping with anything.
No messages regarding what could cause the error.

I´m really frustrated about the problem, because I don´t have a clue to
chase down.
But thanks agian for all the answers and your time.
Please do not hesitate to write ! :o) if any of you suddenly comes up with
more things I could check out before calling in a pro.

Best Regards Thue

Jul 20 '05 #5

Hi there

I share your pain and frustration. I would like all possible causes of CPU
100% to be listed somewhere so I can check that I have taken all precautions
to avoid this. Is there such a page anywhere?!

I want to resolve the CPU 100% problem myself - I thought I had resolved
before but has come back to haunt me now with a new database server machine
with SQL Server 2000, SP 3, which apparently does not allow one to fully
remove the "named pipes" protocol. I resolved a CPU 100% issue months ago by
making sure only TCP/IP was used as the protocol, and removing Named pipes.
Removing from the Enterprise manager (button at bottom in general settings
tab), network protcol "named pipes", and from client connection settings
manager, so that only TCP/IP is allowed as a network protocol. My theory is
that these pipes become blocked, and this causes 100% CPU usage. Could
anyone confirm that this is a known symptom of the named pipes protocol ?!

I am currently having to reboot the machine every few days now since we put
in a new database server with the latest service packs (SQL Server 2000,
SP3). Removing the named-pipes protcol does not seem to have resolved this
nasty problem this time round. I have seen on some newsgroup postings, that
it is no longer possible to actually remove Named Pipes fully since SP3.

The following article thread indicates this:-
http://www.mcse.ms/message97673.html

which is kind of worrying to me, because I was fairly sure removing Named
Pipes as a protocol before, completely cured the CPU 100% symptoms.

My correspondence chess website www.chessworld.net makes heavy use of SQL
Server 2000. It has been running for over 2 years now, and sometimes has
about 200 members online or more within the space of 10 minutes. Overall SQL
Server 2000 has been great, but recently these reboots have been quite
frustrating, and I cannot seem to identify the cause. I continually monitor
any ASP pages that time out with SQL Server errors, and always keen to
ensure all queries run quick on my site. I do not think it is a bad sql
query problem. I continually make efforts to optimise all queries used on
the site. I have also made sure from a long time ago that (NO LOCK) is being
used on select statements to minimise lock escalation.

I found the following article today which is another possible cause of CPU
100%:-

http://support.microsoft.com/default...NoWebContent=1

which possible attributes the Microsoft search service to CPU 100%. I have
now disabled this service from our new database server machine, and put it
to Manual on Startup.

Help needed to resolve CPU 100% issue !

Best wishes
Tryfon Gavriel
Webmaster
www.chessworld.net

"Thue Tuxen Sørensen" <tu***@esynergy.dk> wrote in message
news:3f**********************@dread14.news.tele.dk ...

Hi everybody !

I´m maintaining a large intranet (approx 10000 concurrent users) running on one IIS box and one DB box with sqlserver 2000.

Currently there is 2,5 GB Ram, 1 1400 mhz cpu and 2 scsi disks installed on the db box.
Sqlserver is set to use max 1,4 GB RAM, and the sqlserver does not seem to
be using it all.

Currently SQLSERVER 2000 crashes at least once a day.
Its very weird, I run performance monitor with counters on, memory, disk
usage, num users, locks and such.

There is no indications in the counters before the crashes, they just happen very sudden.
Only indication is that sqlserver makes some huge jumps in memory usage and mostly the sqlserver then crashes an hour or 2 later.

The only thing that peaks a lot are the locks/sec counter.

My analysis of disk usage, queues etc. tells me i got no kind of i/o
bottlenecks.

Can anybody give me a clue as to what i should do ?

Best regards, Thue

Jul 20 '05 #6

"Tryfon Gavriel" <tr****@gtryfon.demon.co.uk> wrote in message
news:bt*******************@news.demon.co.uk...

Hi there

I share your pain and frustration. I would like all possible causes of CPU
100% to be listed somewhere so I can check that I have taken all precautions to avoid this. Is there such a page anywhere?!

I want to resolve the CPU 100% problem myself - I thought I had resolved
before but has come back to haunt me now with a new database server machine with SQL Server 2000, SP 3, which apparently does not allow one to fully
remove the "named pipes" protocol. I resolved a CPU 100% issue months ago by making sure only TCP/IP was used as the protocol, and removing Named pipes. Removing from the Enterprise manager (button at bottom in general settings
tab), network protcol "named pipes", and from client connection settings
manager, so that only TCP/IP is allowed as a network protocol. My theory is that these pipes become blocked, and this causes 100% CPU usage. Could
anyone confirm that this is a known symptom of the named pipes protocol ?!
Nope, never seen that happen.

I am currently having to reboot the machine every few days now since we put in a new database server with the latest service packs (SQL Server 2000,
SP3). Removing the named-pipes protcol does not seem to have resolved this
nasty problem this time round. I have seen on some newsgroup postings, that it is no longer possible to actually remove Named Pipes fully since SP3.

The following article thread indicates this:-
http://www.mcse.ms/message97673.html

which is kind of worrying to me, because I was fairly sure removing Named
Pipes as a protocol before, completely cured the CPU 100% symptoms.

My correspondence chess website www.chessworld.net makes heavy use of SQL
Server 2000. It has been running for over 2 years now, and sometimes has
about 200 members online or more within the space of 10 minutes. Overall SQL Server 2000 has been great, but recently these reboots have been quite
frustrating, and I cannot seem to identify the cause. I continually monitor any ASP pages that time out with SQL Server errors, and always keen to
ensure all queries run quick on my site. I do not think it is a bad sql
query problem. I continually make efforts to optimise all queries used on
the site. I have also made sure from a long time ago that (NO LOCK) is being used on select statements to minimise lock escalation.
Keep in mind that is NOT always a good solution.

I found the following article today which is another possible cause of CPU
100%:-

http://support.microsoft.com/default...NoWebContent=1
which possible attributes the Microsoft search service to CPU 100%. I have
now disabled this service from our new database server machine, and put it
to Manual on Startup.

Help needed to resolve CPU 100% issue !
Best bet is probably to try to have profiler running.
There can be many reasons. Keep in mind it's perfectly possible to be using
100% of the CPU and it not be a bug etc. It could simply be you're that
busy.

We have a DB server that from time to time hits 100% CPU. And stays that
way for a few seconds or more. It hurts performance, but returns to normal.

Best wishes
Tryfon Gavriel
Webmaster
www.chessworld.net

"Thue Tuxen Sørensen" <tu***@esynergy.dk> wrote in message
news:3f**********************@dread14.news.tele.dk ...
Hi everybody !

I´m maintaining a large intranet (approx 10000 concurrent users) running

on
one IIS box and one DB box with sqlserver 2000.

Currently there is 2,5 GB Ram, 1 1400 mhz cpu and 2 scsi disks installed

on
the db box.
Sqlserver is set to use max 1,4 GB RAM, and the sqlserver does not seem to be using it all.

Currently SQLSERVER 2000 crashes at least once a day.
Its very weird, I run performance monitor with counters on, memory, disk
usage, num users, locks and such.

There is no indications in the counters before the crashes, they just

happen
very sudden.
Only indication is that sqlserver makes some huge jumps in memory usage

and
mostly the sqlserver then crashes an hour or 2 later.

The only thing that peaks a lot are the locks/sec counter.

My analysis of disk usage, queues etc. tells me i got no kind of i/o
bottlenecks.

Can anybody give me a clue as to what i should do ?

Best regards, Thue

Jul 20 '05 #7

"Thue Tuxen Sørensen" <tu***@esynergy.dk> wrote in message
news:3f**********************@dread14.news.tele.dk ...

Thanks for all the answers !

Its running with SP3.
Good.

I think I explained the crashes a bit wrong before maybe ...
What I mean is that the sqlserver suddenly 'hangs' and that its impossible
to communicate with in any way.
You contradict this down below. Which is somewhat critical.

The performance monitor also stops getting input and just freezes.

What metrics are you measuring.

The only way to get the site up running again is to restart the sqlserver
service (not the server).
Theres no indication in the errorlogs, as to what happens just before the
'crash'.
Ive looked through all of them, to see if any of them had some info I could use.

All errorlogs begin with info regarding the startup of sqlserver
initialising the listener and starting up the db´s and such.
After all the info regarding the startup there is nothing in the log.

The next piece of info in the log is the entry where it writes that
sqlserver is terminating due to 'stop' request from service control manager.

Ok. This indicates that the server IS listening.

One thing you may want to do is issue a NET STOP SQLSERVERAGENT command
followed by NET STOP MSSQLSERVER and see which one (if either takes a long
period of time).

When starting does it start up quickly or take time? Is there anything in
the error log about recovering a DB?

Also, does your app call ANY extended stored procs (XP_fooname)

And the stop request is issued by me after the system has crashed / is
hanging.

Yeah. I wouldn't call this a crash. Not even sure I'd call it a hang. But
that's partly semantics.
The event viewer is also not helping with anything.
No messages regarding what could cause the error.

I´m really frustrated about the problem, because I don´t have a clue to
chase down.
But thanks agian for all the answers and your time.
Please do not hesitate to write ! :o) if any of you suddenly comes up with
more things I could check out before calling in a pro.

Just the above.

What happens if you wait? (how long do you wait before cycling it?)

Best Regards Thue

Jul 20 '05 #8

Tryfon Gavriel (tr****@gtryfon.demon.co.uk) writes:

I am currently having to reboot the machine every few days now since we
put in a new database server with the latest service packs (SQL Server
2000, SP3). Removing the named-pipes protcol does not seem to have
resolved this nasty problem this time round. I have seen on some
newsgroup postings, that it is no longer possible to actually remove
Named Pipes fully since SP3.

100% CPU may not be cause for alarm. When SQL Server becomes completely
unresponsive, it certainly is.

I know of two way this can happen. Or rather, I know of one, and one
"seemingly unresponsive". The one case where it becomes unresponsive,
is error 17883. If this happens, you should see this in the error log,
where you get a load of these messages. The message only appears with
SP3 or later hotfixes.

The other case I've seen was with some poor SQL. In this particular
case I was testing performance of this poor SQL for an article on my
web site. I was surprised to see that this particular query took so
much CPU, that issuing an sp_who could have a response time on over
30 seconds.

But there are probably more possibilities than these two. But then again,
it certainly not somehing which happens all over town, so if your SQL
Server becomes unresponsive, there is something fishy on your machine,
be that hardware or poor SQL statements.

One way to track down the latter is to have a profiler trace running,
and see what you get just before the machine goes into nirvana.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #9

Thue Tuxen Sørensen

Hi again.

Im measuring CPU, MEMORY, LOCKS and Disk Usage.
Nothing speciel in the logs about revovering.
We don´t use any extended procedures.
I have´nt tried to wait for a long time before restarting it, because theres
a lot of users waiting for it to be up again.
Usually i wait like 5 minutes or so.

/Thue

"Greg D. Moore (Strider)" <mo*****@greenms.com> skrev i en meddelelse
news:k2*******************@twister.nyroc.rr.com...

"Thue Tuxen Sørensen" <tu***@esynergy.dk> wrote in message
news:3f**********************@dread14.news.tele.dk ...
Thanks for all the answers !

Its running with SP3.
Good.

I think I explained the crashes a bit wrong before maybe ...
What I mean is that the sqlserver suddenly 'hangs' and that its impossible to communicate with in any way.

You contradict this down below. Which is somewhat critical.

The performance monitor also stops getting input and just freezes.

What metrics are you measuring.

The only way to get the site up running again is to restart the sqlserver service (not the server).
Theres no indication in the errorlogs, as to what happens just before the 'crash'.
Ive looked through all of them, to see if any of them had some info I

could
use.

All errorlogs begin with info regarding the startup of sqlserver
initialising the listener and starting up the db´s and such.
After all the info regarding the startup there is nothing in the log.

The next piece of info in the log is the entry where it writes that
sqlserver is terminating due to 'stop' request from service control

manager.

Ok. This indicates that the server IS listening.

One thing you may want to do is issue a NET STOP SQLSERVERAGENT command
followed by NET STOP MSSQLSERVER and see which one (if either takes a long
period of time).

When starting does it start up quickly or take time? Is there anything in
the error log about recovering a DB?

Also, does your app call ANY extended stored procs (XP_fooname)

And the stop request is issued by me after the system has crashed / is
hanging.

Yeah. I wouldn't call this a crash. Not even sure I'd call it a hang.

But that's partly semantics.
The event viewer is also not helping with anything.
No messages regarding what could cause the error.

I´m really frustrated about the problem, because I don´t have a clue to
chase down.
But thanks agian for all the answers and your time.
Please do not hesitate to write ! :o) if any of you suddenly comes up with more things I could check out before calling in a pro.

Just the above.

What happens if you wait? (how long do you wait before cycling it?)

Best Regards Thue

Jul 20 '05 #10

Thank you Greg and Erland

I ran an Event trace using SQL profiler when CPU was at 100%, generating
approaching 3200 rows within a few minutes, and interestingly, ordering by
"Duration" revealed the following entries:-

(The first six are all event type 15 - which is "Disconnect" i believe. They
have massive duration times, and massive values for Reads.)

1002 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 90
616436 2004-01-12 18:04:35.687 187793 2 3326
1474 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 58
614733 2004-01-12 18:04:44.687 212743 1 20373
3118 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 108
612796 2004-01-12 18:05:42.107 215728 0 19657
2522 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 66
600640 2004-01-12 18:05:41.810 281198 5 12674
1881 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 72
256296 2004-01-12 18:10:57.093 69592 0 375
353 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 86
126046 2004-01-12 18:12:17.403 331 0 15
974 10 declare @P1 int set @P1=180150025 declare @P2 int set @P2=8 declare
@P3 int set @P3=1 declare @P4 int set @P4=0 exec sp_cursoropen @P1 output,
N' select (select IsNull(min(boardnumber),0) from boardsplayers a WITH
(NOLOCK), games b WITH (NOLOCK) where a NULL 1320 Microsoft(R) Windows (R)
2000 Operating System sa 62 36763 2004-01-12 18:14:14.797 19868 0 0
I am not sure how to interpret these events. What does a massive duration on
Event Type 15 mean?! Also there are a massive amount of "Reads" associated
with these.

Any help greatly appreciated!

If I order by the CPU column descending, the top 10 rows are:-

1474 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 58
614733 2004-01-12 18:04:44.687 212743 1 20373
3118 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 108
612796 2004-01-12 18:05:42.107 215728 0 19657
2522 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 66
600640 2004-01-12 18:05:41.810 281198 5 12674
1002 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 90
616436 2004-01-12 18:04:35.687 187793 2 3326
56 12 SELECT
notepadvisible,sandbagger,tournamentsummaryview,ad min,ShowRatingsofplayers,j
avascriptboardcreator,htmlemails,listcurrentgamesv iew,logincount,JavaScriptS
upportLevel,RatingPredictorStyle,MessageBoxType,La nguageID,TeamCreator,banne
rs,IsNull(CountryID,1 NULL 1320 Microsoft(R) Windows (R) 2000 Operating
System sa 102 12843 2004-01-12 18:14:00.733 11599 0 2875
954 12 SELECT
notepadvisible,sandbagger,tournamentsummaryview,ad min,ShowRatingsofplayers,j
avascriptboardcreator,htmlemails,listcurrentgamesv iew,logincount,JavaScriptS
upportLevel,RatingPredictorStyle,MessageBoxType,La nguageID,TeamCreator,banne
rs,IsNull(CountryID,1 NULL 1320 Microsoft(R) Windows (R) 2000 Operating
System sa 96 25050 2004-01-12 18:14:25.340 11599 0 2797
1817 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 85
6810 2004-01-12 18:15:02.640 3484 0 1516
1533 12 select GameNumber,TournamentID from TournamentGames WITH (NOLOCK)
where gamenumber = 349913 NULL 1320 Microsoft(R) Windows (R) 2000 Operating
System sa 58 1563 2004-01-12 18:14:59.437 755 0 1015
1553 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 58
1966 2004-01-12 18:14:59.437 765 0 1015
1881 15 NULL NULL 1320 Microsoft(R) Windows (R) 2000 Operating System sa 72
256296 2004-01-12 18:10:57.093 69592 0 375

Again any help in diagnosing the cause of this, which puts the CPU to 100%
would be greatly appreciated.

Best wishes
Tryfon

"Erland Sommarskog" <so****@algonet.se> wrote in message
news:Xn*********************@127.0.0.1...

Tryfon Gavriel (tr****@gtryfon.demon.co.uk) writes:
I am currently having to reboot the machine every few days now since we
put in a new database server with the latest service packs (SQL Server
2000, SP3). Removing the named-pipes protcol does not seem to have
resolved this nasty problem this time round. I have seen on some
newsgroup postings, that it is no longer possible to actually remove
Named Pipes fully since SP3.

100% CPU may not be cause for alarm. When SQL Server becomes completely
unresponsive, it certainly is.

I know of two way this can happen. Or rather, I know of one, and one
"seemingly unresponsive". The one case where it becomes unresponsive,
is error 17883. If this happens, you should see this in the error log,
where you get a load of these messages. The message only appears with
SP3 or later hotfixes.

The other case I've seen was with some poor SQL. In this particular
case I was testing performance of this poor SQL for an article on my
web site. I was surprised to see that this particular query took so
much CPU, that issuing an sp_who could have a response time on over
30 seconds.

But there are probably more possibilities than these two. But then again,
it certainly not somehing which happens all over town, so if your SQL
Server becomes unresponsive, there is something fishy on your machine,
be that hardware or poor SQL statements.

One way to track down the latter is to have a profiler trace running,
and see what you get just before the machine goes into nirvana.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #11

Tryfon Gavriel (tr****@gtryfon.demon.co.uk) writes:

(The first six are all event type 15 - which is "Disconnect" i believe.
They have massive duration times, and massive values for Reads.)
...

I am not sure how to interpret these events. What does a massive
duration on Event Type 15 mean?! Also there are a massive amount of
"Reads" associated with these.

As you said, event 15 is disconnection. Duration is just how long the
connection was open. And Reads are just the accumulated number of
reads during that session.

In itself, not that exciting. Then again, maybe it is a clue that four
long-running processes owned by sa quits just before the machines
reaches nirvana. No, please don't ask me what that clue would mean!

It is possible that the SQL statements you see when you sort on Duration
has anything to do with the CPU hog. However, I wouid not really expect
that process to show up. I would include the Starting events in the trace,
and then investigate the uncompleted events at the end of the trace
when the CPU goes 100%.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #12

Joe Maloney

Erland Sommarskog <so****@algonet.se> wrote in message news:<Xn**********************@127.0.0.1>...

Tryfon Gavriel (tr****@gtryfon.demon.co.uk) writes:
(The first six are all event type 15 - which is "Disconnect" i believe.
They have massive duration times, and massive values for Reads.)
...

I am not sure how to interpret these events. What does a massive
duration on Event Type 15 mean?! Also there are a massive amount of
"Reads" associated with these.

As you said, event 15 is disconnection. Duration is just how long the
connection was open. And Reads are just the accumulated number of
reads during that session.

In itself, not that exciting. Then again, maybe it is a clue that four
long-running processes owned by sa quits just before the machines
reaches nirvana. No, please don't ask me what that clue would mean!

It is possible that the SQL statements you see when you sort on Duration
has anything to do with the CPU hog. However, I wouid not really expect
that process to show up. I would include the Starting events in the trace,
and then investigate the uncompleted events at the end of the trace
when the CPU goes 100%.

TJI:

I have seen a similar problem (loss of connectivity, performance
grinds to a halt, etc.) when one of our servers has 3 (or more)
exchange waits active for extended periods of time. We start seeing
performance degradation at 2 exchange waits, which raises the flag for
us.

What is happening in our case is that users for some reason disconnect
their side (reboot their workstation, etc) with bad timing, just in
the middle of a network handshake before responding to the server. The
server just waits for the packet from the client that never comes.
These transactions stay alive, tying up datapages and locks, which
escalate as users log back in and retry.....

(We saw this by waiting, sometimes as long as 1/2 hour, for EM to
connect, then waiting again for the Process list to present itself.)

With your large number of concurrent users, I would not be surprised
if this is not your problem.

Jul 20 '05 #13

Hi Erland and all

The server has been standing for 2 weeks without a reboot. This has been a
great relief to me. If my solution may help others, the two things I did
were:-

a) Simplify some of the SQL - taking out some luxury sub-queries off many
pages
b) Taking off auto-grow from three of the databases - tempdb, the main
Chessworld db, and master.

I was not exactly sure if it was a) or b) but I have more evidence now it
was in fact b) that was causing massive slow-downs requiring a reboot
because CPU seems to go unrecoverably to 100%.

The reason for more evidence, is that today, I finally had a "cannot
allocate space error" being logged. I increased the size of the chessworld
db, and the tempdb, and put back the auto-grow on the chessworld db. Within
about an hour or two, the symptoms of a big slow-down came back with CPU
100%.

I rebooted the database server but have again taken off auto-grow options. I
believe for my site with many concurrent users, the auto-grow is causing
issues. I will keep you posted.

Best wishes
Tryfon

"Erland Sommarskog" <so****@algonet.se> wrote in message
news:Xn**********************@127.0.0.1...

Tryfon Gavriel (tr****@gtryfon.demon.co.uk) writes:
(The first six are all event type 15 - which is "Disconnect" i believe.
They have massive duration times, and massive values for Reads.)
...

I am not sure how to interpret these events. What does a massive
duration on Event Type 15 mean?! Also there are a massive amount of
"Reads" associated with these.

As you said, event 15 is disconnection. Duration is just how long the
connection was open. And Reads are just the accumulated number of
reads during that session.

In itself, not that exciting. Then again, maybe it is a clue that four
long-running processes owned by sa quits just before the machines
reaches nirvana. No, please don't ask me what that clue would mean!

It is possible that the SQL statements you see when you sort on Duration
has anything to do with the CPU hog. However, I wouid not really expect
that process to show up. I would include the Starting events in the trace,
and then investigate the uncompleted events at the end of the trace
when the CPU goes 100%.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #14

"Tryfon Gavriel" <tr****@gtryfon.demon.co.uk> wrote in message
news:bv*******************@news.demon.co.uk...

Hi Erland and all

The server has been standing for 2 weeks without a reboot. This has been a
great relief to me. If my solution may help others, the two things I did
were:-

a) Simplify some of the SQL - taking out some luxury sub-queries off many
pages
b) Taking off auto-grow from three of the databases - tempdb, the main
Chessworld db, and master.
That'll do it right there.

Here's a typical scenario:

DB growth is set to 10%

DB is 100MB...

An insert is performed.... limit gets reached. So, now the DB wants to
expand.

It starts to allocate 10MB.

During this time, deletes and updates can generally be performed, but
basically any additional inserts will be blocked while the space is
allocated. (and any updates or deletes that need to occur on those blocked
inserts obviously get blocked.)

Now, SQL Server can generally allocate 10MB pretty quick.

But now you've got 110MB. Next expansion will be 11MB. Putting you at
121MB. Next one will be 12.1 MB. And this continues.

Before you know it, you've got a 10 gig DB trying to allocate 1GB. (and the
kicker is, it probably only needs 10MB at that point. :-)

And of course during this allocation, the DB appears hung.

So, I generally try NOT to allow auto-growth, or set it to a fixed amount
(like 10MB or 100MB, etc. depending on the size and type of DB).

Also, this can occur a lot with transaction logs. Which generally means
that no transcation backups are being done. Which on a production DB is
almost always a "bad thing".

Hmm, given what yu say, I'm guessing that your tempb may be growing a lot.
(Since upon restart I believe it'll get resized back to it's original size.)

This could be a result of a bad design, or simply the result of a necessary
design.

What I'd do is check which DB is growing the most and resize it.

The master DB normally should not grow much at all.

So it's most likely the tempdb or the chessworld one. (as he states the
obvious.)

I was not exactly sure if it was a) or b) but I have more evidence now it
was in fact b) that was causing massive slow-downs requiring a reboot
because CPU seems to go unrecoverably to 100%.

The reason for more evidence, is that today, I finally had a "cannot
allocate space error" being logged. I increased the size of the chessworld
db, and the tempdb, and put back the auto-grow on the chessworld db. Within about an hour or two, the symptoms of a big slow-down came back with CPU
100%.

I rebooted the database server but have again taken off auto-grow options. I believe for my site with many concurrent users, the auto-grow is causing
issues. I will keep you posted.

Please do.

Best wishes
Tryfon

"Erland Sommarskog" <so****@algonet.se> wrote in message
news:Xn**********************@127.0.0.1...
Tryfon Gavriel (tr****@gtryfon.demon.co.uk) writes:
(The first six are all event type 15 - which is "Disconnect" i believe. They have massive duration times, and massive values for Reads.)
...

I am not sure how to interpret these events. What does a massive
duration on Event Type 15 mean?! Also there are a massive amount of
"Reads" associated with these.

As you said, event 15 is disconnection. Duration is just how long the
connection was open. And Reads are just the accumulated number of
reads during that session.

In itself, not that exciting. Then again, maybe it is a clue that four
long-running processes owned by sa quits just before the machines
reaches nirvana. No, please don't ask me what that clue would mean!

It is possible that the SQL statements you see when you sort on Duration
has anything to do with the CPU hog. However, I wouid not really expect
that process to show up. I would include the Starting events in the trace, and then investigate the uncompleted events at the end of the trace
when the CPU goes 100%.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #15

Hi there

Thank you for that feedback.

I should have also mentioned the following: There was an unexpected database
server shutdown last night recorded in the w2k event viewer. I had to
request the database server to be restarted. Once it did it was working fine
for a while, but then I noticed on my ASP error log, the "cannot allocate
space" errors. It was then that I increased the size, but also put auto-grow
on. It just died within 2 hours. What I then did was take off autogrow, and
restart the database server again.

So basically:-

* I have doubled the size of the tempdb database to 2 gig for data file and
1 gig for transaction log
* I have also doubled the size of the Chessworld database (One concern here
is the time it takes the backup the database, but it still seems to be able
to back it up within a few minutes.. a relief :) )
* Auto-grow taken off both databases

If the server shuts down in 2 months time, then fine. I will request a
database server reboot and increase their sizes again. I cannot have a
background process re-allocating space, when I have tonnes of players online
playing chess moves (or trying to!), resulting in me having to reboot the
server. The "cannot allocate" space errors that occured last night have now
stopped.

The following may be useful for other ASP/SQL Server developers for general
problem diagnosis: About two weeks ago, I knocked up an ASP admin page to
monitor the sysprocesses table. This is useful to me in trying to understand
the processes with greatest CPU usage. I ordered it by CPU, but also made it
highlight in red processes which had a last batch time of more than 10
minutes ago. The idea was to highlight potential processes that could be
killed. I found the following three particularly useful web references :-

Kill documentation:
http://msdn.microsoft.com/library/de...kf-kz_1zos.asp
Tips for handling blocking:
http://www.sql-server-performance.com/blocking.asp
Understanding and resolving blocking problems:
http://support.microsoft.com/default...NoWebContent=1

Which I have put links at the top of my admin page for viewing processes :-)
It also made me paranoid about the background processes going on- hence my
intuition to turn off the auto-grow tick boxes.

Some insights include - simplification of pages, do seem to lead to
processes consuming less CPU, and generally a faster site. But also the
Background processes are highlighted. I think viewing the sysprocesses table
is very useful point of reference, and the reason I started investigating
it, is because it is mentioned in SQL Serrver 2000 programming book page
1081, where it also highlights using the following tools for analysing
problems:-

a) SHOWPLAN TEXT | ALL
b) STATISTICS IO
c) DBCC
d) Query governer
e) sp_lock
f) sysprocesses table
g) SQL Server Profiler
(should be listed because detailed): h) Perfmon

Before posting to this excellent group, I had not actually used the SQL
profiler much at all. I did have admin pages already for sp_lock and sp_who.
But I usually use sp_lock for analysing locks, and ignored the sp_who most
of the time. The view on sysprocesses is more useful to me because you can
order by cpu, etc. I now regularly look at the sp_lock page and the
"processes" page.

I also make use of the ASP error object to generate errors in a log file,
and my most frequently logged error is now SQL Server related. This means I
can immediately see any bottleneck ASP pages where there is potentially bad
SQL or other issues.

Best wishes
Tryfon
"Greg D. Moore (Strider)" <mo****************@greenms.com> wrote in message
news:Ug******************@twister.nyroc.rr.com...

"Tryfon Gavriel" <tr****@gtryfon.demon.co.uk> wrote in message
news:bv*******************@news.demon.co.uk...
Hi Erland and all

The server has been standing for 2 weeks without a reboot. This has been a
great relief to me. If my solution may help others, the two things I did
were:-

a) Simplify some of the SQL - taking out some luxury sub-queries off many pages
b) Taking off auto-grow from three of the databases - tempdb, the main
Chessworld db, and master.
That'll do it right there.

Here's a typical scenario:

DB growth is set to 10%

DB is 100MB...

An insert is performed.... limit gets reached. So, now the DB wants to
expand.

It starts to allocate 10MB.

During this time, deletes and updates can generally be performed, but
basically any additional inserts will be blocked while the space is
allocated. (and any updates or deletes that need to occur on those blocked
inserts obviously get blocked.)

Now, SQL Server can generally allocate 10MB pretty quick.

But now you've got 110MB. Next expansion will be 11MB. Putting you at
121MB. Next one will be 12.1 MB. And this continues.

Before you know it, you've got a 10 gig DB trying to allocate 1GB. (and

the kicker is, it probably only needs 10MB at that point. :-)

And of course during this allocation, the DB appears hung.

So, I generally try NOT to allow auto-growth, or set it to a fixed amount
(like 10MB or 100MB, etc. depending on the size and type of DB).

Also, this can occur a lot with transaction logs. Which generally means
that no transcation backups are being done. Which on a production DB is
almost always a "bad thing".

Hmm, given what yu say, I'm guessing that your tempb may be growing a lot.
(Since upon restart I believe it'll get resized back to it's original size.)
This could be a result of a bad design, or simply the result of a necessary design.

What I'd do is check which DB is growing the most and resize it.

The master DB normally should not grow much at all.

So it's most likely the tempdb or the chessworld one. (as he states the
obvious.)

I was not exactly sure if it was a) or b) but I have more evidence now i t was in fact b) that was causing massive slow-downs requiring a reboot
because CPU seems to go unrecoverably to 100%.

The reason for more evidence, is that today, I finally had a "cannot
allocate space error" being logged. I increased the size of the chessworld db, and the tempdb, and put back the auto-grow on the chessworld db. Within
about an hour or two, the symptoms of a big slow-down came back with CPU
100%.

I rebooted the database server but have again taken off auto-grow

options. I
believe for my site with many concurrent users, the auto-grow is causing
issues. I will keep you posted.

Please do.

Best wishes
Tryfon

"Erland Sommarskog" <so****@algonet.se> wrote in message
news:Xn**********************@127.0.0.1...
Tryfon Gavriel (tr****@gtryfon.demon.co.uk) writes:
> (The first six are all event type 15 - which is "Disconnect" i

believe. > They have massive duration times, and massive values for Reads.)
>...
>
> I am not sure how to interpret these events. What does a massive
> duration on Event Type 15 mean?! Also there are a massive amount of
> "Reads" associated with these.

As you said, event 15 is disconnection. Duration is just how long the
connection was open. And Reads are just the accumulated number of
reads during that session.

In itself, not that exciting. Then again, maybe it is a clue that four
long-running processes owned by sa quits just before the machines
reaches nirvana. No, please don't ask me what that clue would mean!

It is possible that the SQL statements you see when you sort on Duration has anything to do with the CPU hog. However, I wouid not really expect that process to show up. I would include the Starting events in the trace, and then investigate the uncompleted events at the end of the trace
when the CPU goes 100%.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #16

Tryfon Gavriel (tr****@gtryfon.demon.co.uk) writes:

* I have also doubled the size of the Chessworld database (One concern
here is the time it takes the backup the database, but it still seems to
be able to back it up within a few minutes.. a relief :) )

My experience is that the time to do a backup is related to the actual
amount of data in the database. That is, if you allocate 60 GB for a
1 GB database, then those 59 GB are cheap. (The one occassion they
cost, is when you want to restore a backup into a clone database;
then the allocation of those 59 GB will take 10-20 minutes extra.)
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #17