I've been having the same problem for 2 weeks now. If anyone has any
ideas, I'd love to hear them. We are using both SQL and Windows
Authentication. I was running a Profiler Trace at the time, and am
going through it now but have not seen anything yet.
Thanks in advance.
About once a week, at no fixed time (but so far, between 8am and
11am), my SQL Server 2000 on Windows 2000 (8.00.679) will stop
responding. New connections will at first take forever to connect,
then an error message about the PreLoginHandshake(). We have to kill
the service (read below) to stop it, but it comes back without
problems.
The very first time this happened, I got buf latch errors in the SQL
Error Log, but not this time. (However, the first time we waited for
about 5 minutes after it stopped responding , and then the buf latch
error message showed up. This time we restarted the services before 5
minutes had passed.)
The SQL log shows nothing abnormal. The SQL Agent log shows nothing
abnormal. The Event Log shows nothing abnormal. Stopping the service
through the SQL Service Manager doesn't work - we have to go into the
Services control panel, stop the Agent then stop the service, and when
the service says that it cannot be killed, we then run the "kill"
command on it (from the resource kit, I believe) and it stops
immediately. Once this occurs, we can use the SQL Service manager to
start everything up successfully, with no problems in the logs.
Note: we have not rebooted the server yet, just
stopped/killed/restarted the services. We plan on rebooting this
weekend.
Timeline:
10:20 - an openquery job runs successfully - as far as I know,
everything is okay at this point.
10:22 - for some reason, a transaction log backup job does not run
(ran successfully at 10:02, runs every 20 minutes.). That time
doesn't show up at all in the Job History for that job, nor is there a
log file, nor was a report txt file generated.
10:29 - openquery job 1 fails to run. Connections are sluggish, but
open connections can run queries.
10:30 - openquery job 2 fails to run. (not the same query as 1 or 3)
Connections are sluggish, but open connections can run queries.
10:30-10:40 Enterprise Manager cannot connect - stuck during
connection. This is the case on multiple machines, as well as on the
server. My Enterprise Manager doesn't respond, and I cannot start a
new instance. "Select getdate()" can take several seconds to run, and
I get a "Lost connection" error.
10:40 - openquery job 3 fails to run. Profiler shows my openquery job
was the last thing run - no further profile messages for the next 2-3
minutes.
10:43 - We start shutting down the server.