I wrote code that parses the db2diag.log to look for errors that would
require us to generate a message to a service center indicating
something is wrong. My problem is trying to figure out what the errors
are. From studying the log, I noticed the codes on the first line after
the date/time are all the same for the same message. For example:
2005-11-17-09.55.31.988098-420 E5537C330 LEVEL: Error (OS)
PID : 21700 TID : 1 PROC : db2pclnr 0
INSTANCE: hdbuser NODE : 000
FUNCTION: DB2 UDB, oper system services, sqloDispatchNBlocks, probe:30
CALLED : OS, -, unspecified_system_function
OSERR : EFAULT (14) "Bad address"
2005-11-17-09.55.31.989606-420 I5868C367 LEVEL: Severe
PID : 21700 TID : 1 PROC : db2pclnr 0
INSTANCE: hdbuser NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbClnrDispatchSomeAIO,
probe:100
MESSAGE : writeStatus =
DATA #1 : Hexdump, 8 bytes
0x2FF21270 : 0000 0000 0000 0000 ........
These entries are from a db2diag.log file on our systems. The first one
has a "code" of C330 and the second a "code" of C367. The C330 and C367
are the same for identical log messages. The only thing that changes is
the date/time and the numbers before the C330 and C367. So, I assumed I
could check for these types of errors. My problem is I can relate that
number to anything I can find in documentation. Are these numbers valid
to check? We already have code to check the sql errors from APIs so I
am not sure I need to parse for those.
I looked at the db2diag tool but it seemed quite cumbersome for what I
needed and is confusing to use in my opinion.
Has anyone done anything like this, can help identify what the codes
are, or have any suggestions? I am on AIX 5.3.