Hi!
For a few months we suffer mysterious problem with Oracle 10g RAC (more
details on server configuration at the bottom). At regular basis (every 5
minutes) nodes of our cluster "freeze" - during couple of seconds
operating system for some mysterious reason does nothing - as far as we're
concerned - in userspace. Every single userspace process stops for this
period. After a few seconds system comes back to life and all suspended
processes content for CPU time and other resources, which in effect leads
to higher load.
We tried many investigations, which brougt us to following conclusions:
- when Oracle instance is stopped, freezes disappear
- since the moment we reduced shared_servers parameter from 60 to 20
freezes last unsignificantly shorter (about 4 seconds shorter)
- after instance restart freezes are unnoticeable, but as times
goes by, they are again as long as 6-9 seconds
Unfortunately investigation is very hard. /var/log/messages reports
nothing, dmesg reports nothing, Oracle alert log also has nothing to say.
Have you any idea what may be misconfigured or damaged? Could you please
suggest us some further tests?
Thank you in advance for any followups!
Database server characteristics
-------------------------------
OS: RHEL 3 ES
Kernel: 2.4.21-32.ELsmp
Oracle: 10.2.0.1.0
Storage: SAN accessed by QLogic HBA
Cluster storage: OCFS v.1
--
- MARCIN SZAREK <me@some.where.com -
-- There are only 10 types of people in this world; --
-- those who understand binary, and those who don't --