By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,649 Members | 2,154 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,649 IT Pros & Developers. It's quick & easy.

DB2 Health Monitor v9 fixpack 4 still can be a BIG PROBLEM

P: 2
I have opened a PMR for this with IBM, and am not asking for advice from the DB2 DBA community. I am posting this as an FYI that DB2 Health Monitor, even at the latest version of DB2, still can cause huge problems with slow connect times and heavy resource locking in high concurrency / high transaction volume environments. I have an OLTP with 30-90 transactions per second activity, and start of Health Monitor every 2 hours was crashing our application. We only resolved the situation by setting DBM CFG parameter HEALTH_MON = OFF.

Note that other thescripts reports show problems with db2hmon going back to version 8.2.

My advice for high transaction volume databases is to not even think about leaving Health Monitor active. (I also see other autonomous features such as STMM giving undesirable effects, so we have deactivated self tuning memory manager as well.).

Here is more detail, from the doc I sent IBM on 2008.03.14:

======

Summary

We are a DB2 version 9 Fixpack 4 user, 64-bit, running on a Solaris 10 zone on a T2000. The production application is very high transaction volume, 20-90 transactions per second.

We have experienced three separate errors with DB2 Health Monitor that have forced us to deactivate it entirely. To summarize they are:

1. In high transaction environment, every time DB2 Health Monitor runs (every 2 hours) causes heavy new connection slowdown and heavy table locking.
2. In one server, DB2 Health Monitor crashes every time it runs with a sqlerRemoveAllIPCforRow violation. When it crashes it causes heavy connection slowdowns and heavy table locking which result in errors for the heavy concurrency user application.
3. DB2 Health Monitor CALL REORGCHK_IX_STATS fails with a SQLCODE “-2306” even though the table exists.

Detail

Problem #1 Health Monitor causes heavy locking and new connections slowdown

We originally opened this PMR to request IBM’s help in determining why our high transaction volume application was experiencing strong connection slowdowns every 2 hours. IBM's engineer said that it did not seem to be a DB2 process that was causing the problem. I did decide to fix MAXLOCKS and that made a marginal improvement, but the problem continued.

Yesterday, realizing that DB2HMON runs every 2 hours, we deactivated DB2HMON ( UPDATE DBM CFG USING HEALTH_MON OFF ) and the connection slowdown/locking related application failures immediately ended. It is obvious that the problem was entirely caused by Health Monitor.

Problem #2 Health Monitor crashes with IPC Resource Error

On the staging server (NOT the production server), every time DB2HMON attempts to run, it crashes. As a result we have decided to completely prevent DB2HMON from starting, using IBM's advice ( http://www-1.ibm.com/support/docview...id=swg21259046 ) , by setting the DB2_FMP_COMM_HEAPSZ to 0. Despite the fact that we are not using Health Monitor, we would like IBM to fix this crash behavior.

Here is an example from today’s db2diag.log on the staging database:


Expand|Select|Wrap|Line Numbers
  1. 2008-03-13-00.18.44.059066-300 E876A408           LEVEL: Warning
  2. PID     : 10499                TID  : 1           PROC : db2sysc 0
  3. INSTANCE: db2                  NODE : 000
  4. FUNCTION: DB2 UDB, routine_infrastructure, sqlerReturnFmpToPool, probe:999
  5. DATA #1 : String, 22 bytes
  6. Removing FMP from pool
  7. DATA #2 : Hexdump, 16 bytes
  8. 0xFFFFFFFF7FFFE160 : 0000 0000 0000 0000 0000 084F 0000 0000    ...........O....
  9.  
  10. 2008-03-13-00.18.44.076838-300 E1285A456          LEVEL: Error
  11. PID     : 10499                TID  : 1           PROC : db2sysc 0
  12. INSTANCE: db2                  NODE : 000
  13. FUNCTION: DB2 UDB, routine_infrastructure, sqlerRemoveAllIPCforRow, probe:10
  14. DATA #1 : String, 32 bytes
  15. Freeing IPC resource explicitly:
  16. DATA #2 : SQLO_PID, PD_TYPE_SQLO_PID, 4 bytes
  17. 2127
  18. DATA #3 : Hexdump, 4 bytes
  19. 0x0000000200447FE0 : 0000 0000                                  ....
  20.  
  21. 2008-03-13-00.18.44.077523-300 E1742A345          LEVEL: Error
  22. PID     : 10499                TID  : 1           PROC : db2sysc 0
  23. INSTANCE: db2                  NODE : 000
  24. FUNCTION: DB2 UDB, routine_infrastructure, sqlerRemoveAllIPCforRow, probe:20
  25. DATA #1 : String, 22 bytes
  26. IPC resources Address:
  27. DATA #2 : Pointer, 8 bytes
  28. 0x0000000210010080
  29.  
  30. 2008-03-13-00.18.44.078221-300 E2088A1046         LEVEL: Error
  31. PID     : 10499                TID  : 1           PROC : db2sysc 0
  32. INSTANCE: db2                  NODE : 000
  33. FUNCTION: DB2 UDB, routine_infrastructure, sqlerRemoveAllIPCforRow, probe:30
  34. DATA #1 : String, 29 bytes
  35. Number of IPC resource found:
  36. DATA #2 : signed integer, 4 bytes
  37. 1
  38. DATA #3 : String, 29 bytes
  39. Number of IPC resource freed:
  40. DATA #4 : signed integer, 4 bytes
  41. 1
  42. CALLSTCK:
  43.   [0] 0xFFFFFFFF7C88B804 __1cXsqlerRemoveAllIPCforRow6FpnLsqlerFmpRow_b_i_ + 0x89C
  44.   [1] 0xFFFFFFFF7C881A68 __1cXsqlerRemoveFmpFromTable6FpnLsqlerFmpRow_b_i_ + 0x220
  45.   [2] 0xFFFFFFFF7C884A6C __1cUsqlerReturnFmpToPool6FccpnOsqlerFmpHandle_pnNsqle_agent_cb__i_ + 0x15DC
  46.   [3] 0x000000010000AD9C __1cOsqleRunSysCtlr6F_i_ + 0x3D4
  47.   [4] 0x0000000100006A64 __1cLsqleSysCtlr6F_i_ + 0x384
  48.   [5] 0xFFFFFFFF7AF75CDC __1cHDBGTerm6F_i_ + 0x1F94
  49.   [6] 0xFFFFFFFF7AF76984 sqloRunInstance + 0x5E4
  50.   [7] 0x0000000100006524 main + 0x924
  51.   [8] 0x0000000100004E5C _start + 0x17C
  52.   [9] 0x0000000000000000 ?unknown + 0x0
  53.  
  54. 2008-03-13-00.18.44.091581-300 I3135A433          LEVEL: Severe
  55. PID     : 10499                TID  : 1           PROC : db2sysc 0
  56. INSTANCE: db2                  NODE : 000
  57. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:15
  58. DATA #1 : Hexdump, 31 bytes
  59. 0x000000010000DB20 : 4865 616C 7468 204D 6F6E 6974 6F72 2050    Health Monitor P
  60. 0x000000010000DB30 : 726F 6365 7373 2063 7261 7368 6564 2E      rocess crashed.
  61.  
  62. 2008-03-13-00.18.44.092207-300 I3569A340          LEVEL: Severe
  63. PID     : 10499                TID  : 1           PROC : db2sysc 0
  64. INSTANCE: db2                  NODE : 000
  65. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:16
  66. DATA #1 : Hexdump, 4 bytes
  67. 0xFFFFFFFF7FFFE170 : 0000 084F                                  ...O
  68.  
  69. 2008-03-13-00.18.44.092769-300 I3910A340          LEVEL: Severe
  70. PID     : 10499                TID  : 1           PROC : db2sysc 0
  71. INSTANCE: db2                  NODE : 000
  72. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:17
  73. DATA #1 : Hexdump, 4 bytes
  74. 0xFFFFFFFF7FFFE174 : 0000 0101                                  ....
  75.  
  76. 2008-03-13-00.18.44.093349-300 I4251A340          LEVEL: Severe
  77. PID     : 10499                TID  : 1           PROC : db2sysc 0
  78. INSTANCE: db2                  NODE : 000
  79. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:18
  80. DATA #1 : Hexdump, 4 bytes
  81. 0xFFFFFFFF7FFFE178 : FFFF FFFF                                  ....
  82.  
  83. 2008-03-13-00.18.44.110896-300 I4592A282          LEVEL: Warning
  84. PID     : 10499                TID  : 1           PROC : db2sysc 0
  85. INSTANCE: db2                  NODE : 000
  86. FUNCTION: DB2 UDB, base sys utilities, sqleRunSysCtlr, probe:63
  87. MESSAGE : Health Monitor Process restarted.
Problem #3 Health Monitor fails incorrectly with SQLCODE “-2306”


When I was running an event monitor to diagnose “Problem #1” above-referenced in this document, I noticed that Health Monitor was running CALL REORGCHK_IX_STATS with a valid table name, for example:

CALL REORGCHK_IX_STATS ('T', '"NEAT "."LOCATION_SETS_LOG"')

And returning SQLCODE -2306, “The table or index name does not exist” even though the table NEAT.LOCATION_SETS_LOG does exist.

Here is event monitor output illustrating the problem:

Expand|Select|Wrap|Line Numbers
  1. 89214) Statement Event ...
  2.   Appl Handle: 587
  3.   Appl Id: *LOCAL.db2.080311150558
  4.   Appl Seq number: 00006
  5.  
  6.   Record is the result of a flush: FALSE
  7.   -------------------------------------------
  8.   Type     : Dynamic
  9.   Operation: Execute
  10.   Section  : 6
  11.   Creator  : NULLID
  12.   Package  : SYSSH200
  13.   Consistency Token  : SYSLVL01
  14.   Package Version ID  :
  15.   Cursor   : SQL_CURSH200C6
  16.   Cursor was blocking: FALSE
  17.   Text     : CALL REORGCHK_IX_STATS ('T', '"NEAT    "."FLIGHT_FARE_LOG"')
  18.   -------------------------------------------
  19.   Start Time: 03/11/2008 10:05:57.563826
  20.   Stop Time:  03/11/2008 10:05:57.713239
  21.   Elapsed Execution Time:  0.149413 seconds
  22.   Number of Agents created: 1
  23.   User CPU: 0.000000 seconds
  24.   System CPU: 0.000000 seconds
  25.   Fetch Count: 0
  26.   Sorts: 1
  27.   Total sort time: 0
  28.   Sort overflows: 0
  29.   Rows read: 0
  30.   Rows written: 0
  31.   Internal rows deleted: 0
  32.   Internal rows updated: 0
  33.   Internal rows inserted: 0
  34.   Bufferpool data logical reads: 0
  35.   Bufferpool data physical reads: 0
  36.   Bufferpool temporary data logical reads: 0
  37.   Bufferpool temporary data physical reads: 0
  38.   Bufferpool index logical reads: 0
  39.   Bufferpool index physical reads: 0
  40.   Bufferpool temporary index logical reads: 0
  41.   Bufferpool temporary index physical reads: 0
  42.   Bufferpool xda logical page reads: 0
  43.   Bufferpool xda physical page reads: 0
  44.   Bufferpool temporary xda logical page reads: 0
  45.   Bufferpool temporary xda physical page reads: 0
  46.   SQLCA:
  47.    sqlcode: -2306
  48.    sqlstate:
  49.  
The matching db2diag.log output for this event was:

Expand|Select|Wrap|Line Numbers
  1. 2008-03-11-10.05.57.713912-300 I276926A452        LEVEL: Error
  2. PID     : 18727                TID  : 26          PROC : db2acd 0
  3. INSTANCE: db2                  NODE : 000
  4. APPID   : *LOCAL.db2.080311150558
  5. FUNCTION: DB2 UDB, Automatic Table Maintenance, db2HmonEvalReorg, probe:515
  6. MESSAGE : Failed to evaluate table "NEAT    "."FLIGHT_FARE_LOG". Probe 400
  7. DATA #1 : Hexdump, 4 bytes
  8. 0xFFFFFFFF6F53F470 : FFFF F6FE                                  ....
Mar 14 '08 #1
Share this Question
Share on Google+
5 Replies


docdiesel
Expert 100+
P: 297
Hi jbenner,

thank you very much for sharing your experiences with the health monitor. I'm sure this may help some people out there. I for myself am still looking forward to migrate to v9.1. ;-) By the way, I took the liberty to add some code tags (to be set with the # button in the editor) to the logs for better readability.

Regards,

Bernd
Mar 14 '08 #2

P: 2
You are welcome! Thanks for helping me with the post readability.
Mar 25 '08 #3

P: 1
Thanks for this information on Health Monitor!

We are currently migrating from DB2 v8 on Solaris 8 to DB2 v9.1 on Solaris 10 and at the same time exploit the virtualization features of Solaris 10 (Zone concept). We first wanted to directly use DB2 9.5 but we ran into several problems. Thus we decided to go for v9.1, Fixpack 4. Still quite a step!

What makes things a bit more tricky is that for development and education platforms, we intend to use Sun T5120 servers. For these ones, we ran into nasty problems including intermittent instance crashes. It took us much time, firmware updates and fixes from Sund and IBM until things started running smoother. Please be aware of such problems if you intend to use this new Sun hardware for DB2.

We encounter the same problems with health monitor to what concerns the db2HmonEvalReorg error on certain tables. We have no clue yet why health monitor can evaluate some of the tables and others not.

We haven't used health monitor up to now, but intend to do so for the new version. Therefore it's good to know that it may cause such heavy performance problems - we can keep an eye on this.

And of course I will appreciate any hint on which health indicators are critical in terms of resource consumption.

Thanks,
pete
Apr 1 '08 #4

P: 7
While we are on the subject ... When in DB2 V9 a database is explicitly activated, two connections show up by default. One is STMM and the other I do not remember what. Do you know how to turn off both. You already mentioned that you turned off STMM. How did you do that? Thx
Apr 5 '08 #5

Shashank1984
P: 26
Superkuper,

Regarding the new db2 agents...i.e.db2stmm and db2taskd..
New DB2 agents have been introduced in DB2 Version 9.1: db2stmm and db2taskd. The db2stmm agent is part of the new self-tuning memory feature added in DB2 Version 9.1. The db2taskd agent is an internal daemon that distributes background database tasks. These new agents remain connected to the database at all times and are not active if the database is activated in exclusive mode.
Starting with Fix Pack 3, agents for active event monitors are also connected to new databases by default.

Regarding disabling of STMM...
Self tuning can be disabled for the entire database by setting self_tuning_mem to OFF. When self_tuning_mem is set to OFF, the memory configuration parameters and buffer pools that are set to AUTOMATIC remain AUTOMATIC and the memory areas remain at their current size.

You can set self_tuning_mem to OFF using the UPDATE DATABASE CONFIGURATION command, the SQLFUPD API, or through the Change Database Configuration Parameter window in the Control Center.

Self tuning can also be effectively deactivated for the entire database if only a single memory consumer is enabled for self tuning. This is because memory cannot be redistributed when only one memory area is enabled.

Thanks,
Shashank Kharche
IBM.
DB2 for Linux, Unix & Windows -Information Management Software
Apr 15 '08 #6

Post your reply

Sign in to post your reply or Sign up for a free account.