Very strange IO problems - Oracle Database

jfixsen

Hello!

Oracle 9.2.0.4
SunOS pchi-db01 5.8 Generic_108528-19 sun4u sparc
SUNW,Ultra-EnterpriseSystem = SunOS
Node = pchi-db01
Release = 5.8
KernelID = Generic_108528-19
Machine = sun4u
BusType = <unknown>
Serial = <unknown>
Users = <unknown>
OEM# = 0
Origin# = 1
NumCPU = 8

History: we had been using a SAN disk array for storage and then
switched over to a Sun T3. About a week after moving to the T3, I saw
the following message in my alert log: WARNING: aiowait timed out 1
times. No performance problems happened until about a month after
that.

Problem: We started seeing huge performance problems from out of
nowhere on December 16 on everything from big batch jobs (heavy FTS)
to simply logging in, and I started seeing several of these aiowait
messages each day, sometimes up to 20. No application changes were
made at any time during any of this, and after the performance
problems started, I even cut the load, mostly big FTS jobs, way back.

I had been running statspack everyday the entire time, and on the day
the performance problems hit hard (about 5 weeks after going to the
T3), the noticeable difference I saw in the statspack reports were all
related to writes from what I could tell.

BEFORE:

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~
% Total
Event Waits Time (s)
Ela Time
-------------------------------------------- ------------ -----------
--------
CPU time 43,689,174
92.51
db file scattered read 131,668,468 949,948
2.01
PX Deq: Execute Reply 931,750 496,692
1.05
direct path read 73,177,620 489,356
1.04
PX Deq Credit: send blkd 24,148,414 425,685
.90

AFTER:

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~
% Total
Event Waits Time (s)
Ela Time
-------------------------------------------- ------------ -----------
--------
log file sync 874,418 604,164
32.92
direct path write 121,724 233,840
12.74
PX Deq Credit: send blkd 1,103,485 212,285
11.57
db file scattered read 12,039,568 165,860
9.04
log buffer space 158,742 127,009
6.92

BEFORE:

Avg
Total Wait wait
Waits
Event Waits Timeouts Time (s) (ms)
/txn
---------------------------- ------------ ---------- ---------- ------
--------
db file scattered read 131,668,468 0 949,948 7
60.4
PX Deq: Execute Reply 931,750 211,623 496,692 533
0.4
direct path read 73,177,620 0 489,356 7
33.5
PX Deq Credit: send blkd 24,148,414 122,149 425,685 18
11.1
db file sequential read 31,794,392 0 349,188 11
14.6
direct path write 2,652,105 0 309,880 117
1.2
log file sync 3,021,375 104,267 201,582 67
1.4
db file parallel write 547,546 254,564 68,136 124
0.3
enqueue 27,670 14,655 50,246 1816
0.0
log buffer space 110,172 32,943 47,180 428
0.1

AFTER:

Avg Total Wait
wait Waits
Event Waits Timeouts Time (s) (ms)
/txn
---------------------------- ------------ ---------- ---------- ------
--------
log file sync 874,418 534,000 604,164 691
3.4
direct path write 121,724 0 233,840 1921
0.5
PX Deq Credit: send blkd 1,103,485 94,421 212,285 192
4.3
db file scattered read 12,039,568 0 165,860 14
46.5
log buffer space 158,742 120,911 127,009 800
0.6
PX Deq: Execute Reply 742,346 61,708 126,984 171
2.9
I am about to call Sun, but I want to be sure it's not something I'm
overlooking. We've got all the latest sun patches, and one post I saw
that had to do with the aiowait message said to change /etc/system to
include the following parm after installing the patch (which we
already had):

* this parm is associated with the aiowait errors that was corrected
in patch 112255-01 solaris v8
set TS:ts_sleep_promote=1

I haven't seen the aiowait message in the alert log since the parm had
been added and the db restarted, but performance still sucks and
statspack numbers are still bad. Most noticably, when I try logging
into sqlplus, it hangs from 10-60 seconds. If I look at the wait
events from another session, it always waits on log buffer space
initially, and then log file sync.

To me, this all smells like a SUN problem, but I would love your
opinions before I call.

Thanks

Jason
jfixsen@nospam_virtumundo.com

Jul 19 '05 #1

Subscribe Post Reply

17389

craig

I am a relative amateur on the subject, but I have been reading a book
Effective Oracle Design By Thomas Kyte p151. He makes mention of direct path
writes and db file scattered reads. For direct path write waits come from
delays writing to TEMP tablespaces ( caused by direct loads, Parallel DML
updates, uncached LOBs. In his example this was caused by a
PGA_AGGREGATE_TARGET set too low so he increased size significantly. You
might want to investigate this area.

He also researched all scattered reads by checking the code that generated
the FTS. Seems you eliminated some of this, but maybe looking at why so
many FTS are occuring. Is indexing not an option for some reason?

In Kyte's examples he worked till direct path writes and scattered reads
were gone completely. Not sure if this is normally acheivable or just in
his example.

For the rest of your processes there is little mention. Log file sync's may
be high but I have no reference to go by as what is normal. Perhaps you are
switching logs too often? Log files too small/too few? Just a WAG guess.
Also it could be log buffer size issues. If the buffer is too big or too
small you can have issues though I think bigger is generally better to a
point. See http://www.ixora.com.au/tips/tuning/log_buffer_size.htm

Other than this I know of checking the instance efficiency page to make sure
your soft parce percentage is high (near 100% after the instance has been
up).

Hope this helps just tossing out random thought. Please post your result if
you find a solution. My best guess is it is not an a Sun issue and that it
is custom code, db tuning issues but that is just a guess using past
experience.
"jfixsen" <jf*****@virtumundo.com> wrote in message
news:26**************************@posting.google.c om...

Hello!

Oracle 9.2.0.4
SunOS pchi-db01 5.8 Generic_108528-19 sun4u sparc
SUNW,Ultra-EnterpriseSystem = SunOS
Node = pchi-db01
Release = 5.8
KernelID = Generic_108528-19
Machine = sun4u
BusType = <unknown>
Serial = <unknown>
Users = <unknown>
OEM# = 0
Origin# = 1
NumCPU = 8

History: we had been using a SAN disk array for storage and then
switched over to a Sun T3. About a week after moving to the T3, I saw
the following message in my alert log: WARNING: aiowait timed out 1
times. No performance problems happened until about a month after
that.

Problem: We started seeing huge performance problems from out of
nowhere on December 16 on everything from big batch jobs (heavy FTS)
to simply logging in, and I started seeing several of these aiowait
messages each day, sometimes up to 20. No application changes were
made at any time during any of this, and after the performance
problems started, I even cut the load, mostly big FTS jobs, way back.

I had been running statspack everyday the entire time, and on the day
the performance problems hit hard (about 5 weeks after going to the
T3), the noticeable difference I saw in the statspack reports were all
related to writes from what I could tell.

BEFORE:

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~
% Total
Event Waits Time (s)
Ela Time
-------------------------------------------- ------------ -----------
--------
CPU time 43,689,174
92.51
db file scattered read 131,668,468 949,948
2.01
PX Deq: Execute Reply 931,750 496,692
1.05
direct path read 73,177,620 489,356
1.04
PX Deq Credit: send blkd 24,148,414 425,685
.90

AFTER:

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~
% Total
Event Waits Time (s)
Ela Time
-------------------------------------------- ------------ -----------
--------
log file sync 874,418 604,164
32.92
direct path write 121,724 233,840
12.74
PX Deq Credit: send blkd 1,103,485 212,285
11.57
db file scattered read 12,039,568 165,860
9.04
log buffer space 158,742 127,009
6.92

BEFORE:

Avg
Total Wait wait
Waits
Event Waits Timeouts Time (s) (ms)
/txn
---------------------------- ------------ ---------- ---------- ------
--------
db file scattered read 131,668,468 0 949,948 7
60.4
PX Deq: Execute Reply 931,750 211,623 496,692 533
0.4
direct path read 73,177,620 0 489,356 7
33.5
PX Deq Credit: send blkd 24,148,414 122,149 425,685 18
11.1
db file sequential read 31,794,392 0 349,188 11
14.6
direct path write 2,652,105 0 309,880 117
1.2
log file sync 3,021,375 104,267 201,582 67
1.4
db file parallel write 547,546 254,564 68,136 124
0.3
enqueue 27,670 14,655 50,246 1816
0.0
log buffer space 110,172 32,943 47,180 428
0.1

AFTER:

Avg Total Wait
wait Waits
Event Waits Timeouts Time (s) (ms)
/txn
---------------------------- ------------ ---------- ---------- ------
--------
log file sync 874,418 534,000 604,164 691
3.4
direct path write 121,724 0 233,840 1921
0.5
PX Deq Credit: send blkd 1,103,485 94,421 212,285 192
4.3
db file scattered read 12,039,568 0 165,860 14
46.5
log buffer space 158,742 120,911 127,009 800
0.6
PX Deq: Execute Reply 742,346 61,708 126,984 171
2.9
I am about to call Sun, but I want to be sure it's not something I'm
overlooking. We've got all the latest sun patches, and one post I saw
that had to do with the aiowait message said to change /etc/system to
include the following parm after installing the patch (which we
already had):

* this parm is associated with the aiowait errors that was corrected
in patch 112255-01 solaris v8
set TS:ts_sleep_promote=1

I haven't seen the aiowait message in the alert log since the parm had
been added and the db restarted, but performance still sucks and
statspack numbers are still bad. Most noticably, when I try logging
into sqlplus, it hangs from 10-60 seconds. If I look at the wait
events from another session, it always waits on log buffer space
initially, and then log file sync.

To me, this all smells like a SUN problem, but I would love your
opinions before I call.

Thanks

Jason
jfixsen@nospam_virtumundo.com

Jul 19 '05 #2

vray

This is a well-known problem with the Solaris 8 scheduler. Low-priority
Threads in a sleep queue on a very busy system never get a chance to
wake up a run - they are starved and their priority never increases. The
ts_sleep_promote=1 will bump up the priority of of these threads over a
period of time.

Hello!

Oracle 9.2.0.4
SunOS pchi-db01 5.8 Generic_108528-19 sun4u sparc
SUNW,Ultra-EnterpriseSystem = SunOS
Node = pchi-db01
Release = 5.8
KernelID = Generic_108528-19
Machine = sun4u
BusType = <unknown>
Serial = <unknown>
Users = <unknown>
OEM# = 0
Origin# = 1
NumCPU = 8

History: we had been using a SAN disk array for storage and then
switched over to a Sun T3. About a week after moving to the T3, I saw
the following message in my alert log: WARNING: aiowait timed out 1
times. No performance problems happened until about a month after
that.

Problem: We started seeing huge performance problems from out of
nowhere on December 16 on everything from big batch jobs (heavy FTS)
to simply logging in, and I started seeing several of these aiowait
messages each day, sometimes up to 20. No application changes were
made at any time during any of this, and after the performance
problems started, I even cut the load, mostly big FTS jobs, way back.

I had been running statspack everyday the entire time, and on the day
the performance problems hit hard (about 5 weeks after going to the
T3), the noticeable difference I saw in the statspack reports were all
related to writes from what I could tell.

BEFORE:

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~
% Total
Event Waits Time (s)
Ela Time
-------------------------------------------- ------------ -----------
--------
CPU time 43,689,174
92.51
db file scattered read 131,668,468 949,948
2.01
PX Deq: Execute Reply 931,750 496,692
1.05
direct path read 73,177,620 489,356
1.04
PX Deq Credit: send blkd 24,148,414 425,685
.90

AFTER:

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~
% Total
Event Waits Time (s)
Ela Time
-------------------------------------------- ------------ -----------
--------
log file sync 874,418 604,164
32.92
direct path write 121,724 233,840
12.74
PX Deq Credit: send blkd 1,103,485 212,285
11.57
db file scattered read 12,039,568 165,860
9.04
log buffer space 158,742 127,009
6.92

BEFORE:

Avg
Total Wait wait
Waits
Event Waits Timeouts Time (s) (ms)
/txn
---------------------------- ------------ ---------- ---------- ------
--------
db file scattered read 131,668,468 0 949,948 7
60.4
PX Deq: Execute Reply 931,750 211,623 496,692 533
0.4
direct path read 73,177,620 0 489,356 7
33.5
PX Deq Credit: send blkd 24,148,414 122,149 425,685 18
11.1
db file sequential read 31,794,392 0 349,188 11
14.6
direct path write 2,652,105 0 309,880 117
1.2
log file sync 3,021,375 104,267 201,582 67
1.4
db file parallel write 547,546 254,564 68,136 124
0.3
enqueue 27,670 14,655 50,246 1816
0.0
log buffer space 110,172 32,943 47,180 428
0.1

AFTER:

Avg Total Wait
wait Waits
Event Waits Timeouts Time (s) (ms)
/txn
---------------------------- ------------ ---------- ---------- ------
--------
log file sync 874,418 534,000 604,164 691
3.4
direct path write 121,724 0 233,840 1921
0.5
PX Deq Credit: send blkd 1,103,485 94,421 212,285 192
4.3
db file scattered read 12,039,568 0 165,860 14
46.5
log buffer space 158,742 120,911 127,009 800
0.6
PX Deq: Execute Reply 742,346 61,708 126,984 171
2.9

I am about to call Sun, but I want to be sure it's not something I'm
overlooking. We've got all the latest sun patches, and one post I saw
that had to do with the aiowait message said to change /etc/system to
include the following parm after installing the patch (which we
already had):

* this parm is associated with the aiowait errors that was corrected
in patch 112255-01 solaris v8
set TS:ts_sleep_promote=1

I haven't seen the aiowait message in the alert log since the parm had
been added and the db restarted, but performance still sucks and
statspack numbers are still bad. Most noticably, when I try logging
into sqlplus, it hangs from 10-60 seconds. If I look at the wait
events from another session, it always waits on log buffer space
initially, and then log file sync.

To me, this all smells like a SUN problem, but I would love your
opinions before I call.

Thanks

Jason
jfixsen@nospam_virtumundo.com

Mar 16 '06 #3

Similar topics

Very Strange MySQL Problem

by: Google Mike | last post by:

I have RH9 and am using the PHP and MySQL that came with it. I was doing fine with all manner of my web pages for this app until I started having this very strange problem. It's a work order...

PHP

Strange Automation Issue

by: Neil Ginsberg | last post by:

I have a strange situation with my Access 2000 database. I have code in the database which has worked fine for years, and now all of a sudden doesn't work fine on one or two of my client's...

Microsoft Access / VBA

Very strange problem using FWRITE() to write data to a binary file

by: leonecla | last post by:

Hi everybody, I'm facing a very very strange problem with a very very simple C program... My goal should be to write to a binary file some numbers (integers), each one represented as a sequence...

C / C++

I'm very disturbed! Sherlock H. HELP

by: Jesper Denmark | last post by:

Within the following construction switch (expression) { int i; i = GetArgs() //return 2 case constant-expression:

C# / C Sharp

Strange compiler behavior after CS0006 error

by: Kris Vanherck | last post by:

yesterday i started getting this strange error when i try to run my asp.net project: Compiler Error Message: CS0006: Metadata file 'c:\winnt\microsoft.net\framework\v1.1.4322\temporary asp.net...

ASP.NET

Emails seemingly disappearing.. very strange problem

by: Chris Ashley | last post by:

I have been tearing my hair out (or indeed, what's left of it) all day with this one. I'm not sure if it's a .NET issue, a server issue or anything else and would appreciate any guidance. ...

C# / C Sharp

Very Strange Error

by: Ron | last post by:

Hi, I have a client side srcipt control that is giving me some very strange problems when my Master.Page is layed out using <DIV> tags for the UI layout. When it is layedout in a <TABLE> style...

ASP.NET

Very strange problem - Firefox and "variable is not defined" after refresh

by: Larax | last post by:

Alright, so here's the problem. I define a global variable in my script and then add methods/properties to it. Everything works great, no error in Javascript Console. But when I refresh site,...

Javascript

PERL Question, very strange one

by: pitjpz | last post by:

We have moved our Database to another server. The server it was on used SQL 4 and the new one its on now uses SQL5 the only problem we can find is that when you attempt to delete a record from...

Perl

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing