Folks -
Again, a three-peat:
RH AS3, UDB 8.1.7 on one pair of x-series, 8.1.8 on 2 i86 test boxes...
We were just about to put all we had into production and now we're
unable to get HADR to work consistently; we get congested-status even
though we have a giggy-net that's very fast (we are able to SSH copy
between our production servers at about 30mb/second, encrypted).
For instance, our production scripts fail (see commentary below) but I
can create a similar set of tables and load 10.4 million rows into the
primary-side and watch the data go to the standby-side perfectly with
no problems. There doesn't seem to be any improvement using our test
8.1.8 boxes over the 8.1.7 boxes.
I'm posting here to see if anybody has any great ideas...I'm sure that
I'll be calling IBM support in a few minutes....
------------------------------- Comments from the Developer:
1) The behavior I generally saw with SWG was the typical "hangup" in
the log processing on the standby where the log numbers would not
continue to increase and it would just stop, then eventually report
congestion on the primary. I always noted that the import script on the
primary would hang, typically around the user/or user activity stuff.
2) I also tried, via command line as db2admin, manually executing the
steps scripted in import_db.sh. Everything would work up until the
user_activity table at which point the "committed XXX rows" would stop
printing out, at which point I would see the same "hung" behavior on
the standby where the logs would stop progressing and the primary would
eventually get congested.
3) 2 times I saw different behavior, typically when manually doing it
line by line, where the secondary would just suddenly "disconnect" for
no apparent reason. I would start hadr again on it as standby and then
it would re-connect.
4) When I saw the import script output hanging in (2) above, I did a ps
-axf and saw several processes I did not notice before. Basically
db2event (db2detaildeadlock). Not sure if related
5) Unfortunately the only way to get out of these situations of
congestion is to kill db2 processes since everything is hung.