On Sep 13, 3:34*pm, Patrick Finnegan <finnegan.patr...@gmail.com>
wrote:
Running db2 8.2 ON aIX 5.3.
We have a third party USEREXIT program that periodically hangs for
some unknown reason.
Db2 generates error message to the diag log.
* * * * MESSAGE : Successfully archived log file S0011930.LOG to USEREXIT
from
* * * * MESSAGE : DB2 is waiting for log files to be archived.
* * * * MESSAGE : DB2 is waiting for log files to be archived.
* * * * MESSAGE : DB2 is waiting for log files to be archived.
* * * * MESSAGE : DB2 is waiting for log files to be archived.
* * * * DB2 was unable to confirm logs were archived.
* * * * MESSAGE : Successfully archived log file S0011931.LOG to USEREXIT
from
* * * * MESSAGE : Successfully archived log file S0011931.LOG to USEREXIT
from
* * * * MESSAGE : DB2 is waiting for log files to be archived.
* * * * MESSAGE : DB2 is waiting for log files to be archived.
* * * * MESSAGE : DB2 is waiting for log files to be archived.
* * * * MESSAGE : DB2 is waiting for log files to be archived.
* * * * DB2 was unable to confirm logs were archived.
* * * * MESSAGE : Successfully archived log file S0011932.LOG to USEREXIT
from
* * * * MESSAGE : Successfully archived log file S0011933.LOG to USEREXIT
from
The big problem is that if DB2 cannot archive the log file it hangs
with symptoms similar to a log path disk full condition. *Transactions
time out and no new applications can connect to the DB. *When I kill
the user exit program DB2 recovers.
Do anyone know the default behavior of DB2 when the userexit program
hangs? *Is there a timeout setting anywhere?
Problem was analyzed by DB2 laboratory which provided the following
answer:
"The message "DB2 was unable to confirm logs were archived." is
triggered when we timeout waiting on a response from the vendor
library.
The db2logmgr calls the Vendor code and then the db2loggr waits for
the db2logmgr to post that the archive attempt succeeded/failed.
However, the Vendor mechanism isn't returning back to the db2logmgr
with the expected success/fail.
The Vendor code just doesn't report back to the db2logmgr(). Since
the Vendor code has control, there is nothing that DB2 can do until
the
Vendor code returns control, in this case with an archive attempt
failed.
As for the possibility to use FAILARCHPATH to circumvent this, I am
afraid that this not possible.
FAILARCHPATH will come into play once the NUMARCHRETRY and
ARCHRETRYDELAY have been exhausted.
But ArchRetryDelay and NumArchRetry will not be in the picture until
the Vendor code returns control to the db2logmgr with a failure return
code, therefore,
the failarchpath is not in play here at all.
This hang condition in the vendor code must somehow be broken to allow
DB2 to once again operate properly. If instead, the hang condition can
be broken
from the Vendors side (ie. hardware shutdown, Vendor code timeout,
etc.) then it is likely that DB2 would recover without recycling the
instance.
(in this case they killed the istance to recover the situation)
I would suggest that you contact Netbackupsupport to investigate
this."
For another customer who had the same hangs, the problem is that when
a tape drive was not available the delay causes DB2 to abort the
backup and retry later.
The problem seems to be when DB2 sent the logs over to VENDOR, and he
tape was not available at this time, being used by another
source to backup their files, and then when one is free, it will
acknowledge DB2's request.
The suggested soultion was on NetBackup, in particular using the
NetBackup Disk Staging Storage Unit.
Was suggested to use this Disk Staging Storage Unit for you DB2
archive log backups.
This type of storage unit would always be available for the archive
logs backup. There is no delay waiting for a tape drive to be
available.
The images written to the DSSU are then moved to NetBackup tape to
complete the data backup.