473,382 Members | 1,368 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

DB2 Health Monitor v9 fixpack 4 still can be a BIG PROBLEM

I have opened a PMR for this with IBM, and am not asking for advice from the DB2 DBA community. I am posting this as an FYI that DB2 Health Monitor, even at the latest version of DB2, still can cause huge problems with slow connect times and heavy resource locking in high concurrency / high transaction volume environments. I have an OLTP with 30-90 transactions per second activity, and start of Health Monitor every 2 hours was crashing our application. We only resolved the situation by setting DBM CFG parameter HEALTH_MON = OFF.

Note that other thescripts reports show problems with db2hmon going back to version 8.2.

My advice for high transaction volume databases is to not even think about leaving Health Monitor active. (I also see other autonomous features such as STMM giving undesirable effects, so we have deactivated self tuning memory manager as well.).

Here is more detail, from the doc I sent IBM on 2008.03.14:



We are a DB2 version 9 Fixpack 4 user, 64-bit, running on a Solaris 10 zone on a T2000. The production application is very high transaction volume, 20-90 transactions per second.

We have experienced three separate errors with DB2 Health Monitor that have forced us to deactivate it entirely. To summarize they are:

1. In high transaction environment, every time DB2 Health Monitor runs (every 2 hours) causes heavy new connection slowdown and heavy table locking.
2. In one server, DB2 Health Monitor crashes every time it runs with a sqlerRemoveAllIPCforRow violation. When it crashes it causes heavy connection slowdowns and heavy table locking which result in errors for the heavy concurrency user application.
3. DB2 Health Monitor CALL REORGCHK_IX_STATS fails with a SQLCODE “-2306” even though the table exists.


Problem #1 Health Monitor causes heavy locking and new connections slowdown

We originally opened this PMR to request IBM’s help in determining why our high transaction volume application was experiencing strong connection slowdowns every 2 hours. IBM's engineer said that it did not seem to be a DB2 process that was causing the problem. I did decide to fix MAXLOCKS and that made a marginal improvement, but the problem continued.

Yesterday, realizing that DB2HMON runs every 2 hours, we deactivated DB2HMON ( UPDATE DBM CFG USING HEALTH_MON OFF ) and the connection slowdown/locking related application failures immediately ended. It is obvious that the problem was entirely caused by Health Monitor.

Problem #2 Health Monitor crashes with IPC Resource Error

On the staging server (NOT the production server), every time DB2HMON attempts to run, it crashes. As a result we have decided to completely prevent DB2HMON from starting, using IBM's advice ( http://www-1.ibm.com/support/docview...id=swg21259046 ) , by setting the DB2_FMP_COMM_HEAPSZ to 0. Despite the fact that we are not using Health Monitor, we would like IBM to fix this crash behavior.

Here is an example from today’s db2diag.log on the staging database:

Expand|Select|Wrap|Line Numbers
  1. 2008-03-13- E876A408           LEVEL: Warning
  2. PID     : 10499                TID  : 1           PROC : db2sysc 0
  3. INSTANCE: db2                  NODE : 000
  4. FUNCTION: DB2 UDB, routine_infrastructure, sqlerReturnFmpToPool, probe:999
  5. DATA #1 : String, 22 bytes
  6. Removing FMP from pool
  7. DATA #2 : Hexdump, 16 bytes
  8. 0xFFFFFFFF7FFFE160 : 0000 0000 0000 0000 0000 084F 0000 0000    ...........O....
  10. 2008-03-13- E1285A456          LEVEL: Error
  11. PID     : 10499                TID  : 1           PROC : db2sysc 0
  12. INSTANCE: db2                  NODE : 000
  13. FUNCTION: DB2 UDB, routine_infrastructure, sqlerRemoveAllIPCforRow, probe:10
  14. DATA #1 : String, 32 bytes
  15. Freeing IPC resource explicitly:
  16. DATA #2 : SQLO_PID, PD_TYPE_SQLO_PID, 4 bytes
  17. 2127
  18. DATA #3 : Hexdump, 4 bytes
  19. 0x0000000200447FE0 : 0000 0000                                  ....
  21. 2008-03-13- E1742A345          LEVEL: Error
  22. PID     : 10499                TID  : 1           PROC : db2sysc 0
  23. INSTANCE: db2                  NODE : 000
  24. FUNCTION: DB2 UDB, routine_infrastructure, sqlerRemoveAllIPCforRow, probe:20
  25. DATA #1 : String, 22 bytes
  26. IPC resources Address:
  27. DATA #2 : Pointer, 8 bytes
  28. 0x0000000210010080
  30. 2008-03-13- E2088A1046         LEVEL: Error
  31. PID     : 10499                TID  : 1           PROC : db2sysc 0
  32. INSTANCE: db2                  NODE : 000
  33. FUNCTION: DB2 UDB, routine_infrastructure, sqlerRemoveAllIPCforRow, probe:30
  34. DATA #1 : String, 29 bytes
  35. Number of IPC resource found:
  36. DATA #2 : signed integer, 4 bytes
  37. 1
  38. DATA #3 : String, 29 bytes
  39. Number of IPC resource freed:
  40. DATA #4 : signed integer, 4 bytes
  41. 1
  43.   [0] 0xFFFFFFFF7C88B804 __1cXsqlerRemoveAllIPCforRow6FpnLsqlerFmpRow_b_i_ + 0x89C
  44.   [1] 0xFFFFFFFF7C881A68 __1cXsqlerRemoveFmpFromTable6FpnLsqlerFmpRow_b_i_ + 0x220
  45.   [2] 0xFFFFFFFF7C884A6C __1cUsqlerReturnFmpToPool6FccpnOsqlerFmpHandle_pnNsqle_agent_cb__i_ + 0x15DC
  46.   [3] 0x000000010000AD9C __1cOsqleRunSysCtlr6F_i_ + 0x3D4
  47.   [4] 0x0000000100006A64 __1cLsqleSysCtlr6F_i_ + 0x384
  48.   [5] 0xFFFFFFFF7AF75CDC __1cHDBGTerm6F_i_ + 0x1F94
  49.   [6] 0xFFFFFFFF7AF76984 sqloRunInstance + 0x5E4
  50.   [7] 0x0000000100006524 main + 0x924
  51.   [8] 0x0000000100004E5C _start + 0x17C
  52.   [9] 0x0000000000000000 ?unknown + 0x0
  54. 2008-03-13- I3135A433          LEVEL: Severe
  55. PID     : 10499                TID  : 1           PROC : db2sysc 0
  56. INSTANCE: db2                  NODE : 000
  57. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:15
  58. DATA #1 : Hexdump, 31 bytes
  59. 0x000000010000DB20 : 4865 616C 7468 204D 6F6E 6974 6F72 2050    Health Monitor P
  60. 0x000000010000DB30 : 726F 6365 7373 2063 7261 7368 6564 2E      rocess crashed.
  62. 2008-03-13- I3569A340          LEVEL: Severe
  63. PID     : 10499                TID  : 1           PROC : db2sysc 0
  64. INSTANCE: db2                  NODE : 000
  65. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:16
  66. DATA #1 : Hexdump, 4 bytes
  67. 0xFFFFFFFF7FFFE170 : 0000 084F                                  ...O
  69. 2008-03-13- I3910A340          LEVEL: Severe
  70. PID     : 10499                TID  : 1           PROC : db2sysc 0
  71. INSTANCE: db2                  NODE : 000
  72. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:17
  73. DATA #1 : Hexdump, 4 bytes
  74. 0xFFFFFFFF7FFFE174 : 0000 0101                                  ....
  76. 2008-03-13- I4251A340          LEVEL: Severe
  77. PID     : 10499                TID  : 1           PROC : db2sysc 0
  78. INSTANCE: db2                  NODE : 000
  79. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:18
  80. DATA #1 : Hexdump, 4 bytes
  81. 0xFFFFFFFF7FFFE178 : FFFF FFFF                                  ....
  83. 2008-03-13- I4592A282          LEVEL: Warning
  84. PID     : 10499                TID  : 1           PROC : db2sysc 0
  85. INSTANCE: db2                  NODE : 000
  86. FUNCTION: DB2 UDB, base sys utilities, sqleRunSysCtlr, probe:63
  87. MESSAGE : Health Monitor Process restarted.
Problem #3 Health Monitor fails incorrectly with SQLCODE “-2306”

When I was running an event monitor to diagnose “Problem #1” above-referenced in this document, I noticed that Health Monitor was running CALL REORGCHK_IX_STATS with a valid table name, for example:


And returning SQLCODE -2306, “The table or index name does not exist” even though the table NEAT.LOCATION_SETS_LOG does exist.

Here is event monitor output illustrating the problem:

Expand|Select|Wrap|Line Numbers
  1. 89214) Statement Event ...
  2.   Appl Handle: 587
  3.   Appl Id: *LOCAL.db2.080311150558
  4.   Appl Seq number: 00006
  6.   Record is the result of a flush: FALSE
  7.   -------------------------------------------
  8.   Type     : Dynamic
  9.   Operation: Execute
  10.   Section  : 6
  11.   Creator  : NULLID
  12.   Package  : SYSSH200
  13.   Consistency Token  : SYSLVL01
  14.   Package Version ID  :
  15.   Cursor   : SQL_CURSH200C6
  16.   Cursor was blocking: FALSE
  17.   Text     : CALL REORGCHK_IX_STATS ('T', '"NEAT    "."FLIGHT_FARE_LOG"')
  18.   -------------------------------------------
  19.   Start Time: 03/11/2008 10:05:57.563826
  20.   Stop Time:  03/11/2008 10:05:57.713239
  21.   Elapsed Execution Time:  0.149413 seconds
  22.   Number of Agents created: 1
  23.   User CPU: 0.000000 seconds
  24.   System CPU: 0.000000 seconds
  25.   Fetch Count: 0
  26.   Sorts: 1
  27.   Total sort time: 0
  28.   Sort overflows: 0
  29.   Rows read: 0
  30.   Rows written: 0
  31.   Internal rows deleted: 0
  32.   Internal rows updated: 0
  33.   Internal rows inserted: 0
  34.   Bufferpool data logical reads: 0
  35.   Bufferpool data physical reads: 0
  36.   Bufferpool temporary data logical reads: 0
  37.   Bufferpool temporary data physical reads: 0
  38.   Bufferpool index logical reads: 0
  39.   Bufferpool index physical reads: 0
  40.   Bufferpool temporary index logical reads: 0
  41.   Bufferpool temporary index physical reads: 0
  42.   Bufferpool xda logical page reads: 0
  43.   Bufferpool xda physical page reads: 0
  44.   Bufferpool temporary xda logical page reads: 0
  45.   Bufferpool temporary xda physical page reads: 0
  46.   SQLCA:
  47.    sqlcode: -2306
  48.    sqlstate:
The matching db2diag.log output for this event was:

Expand|Select|Wrap|Line Numbers
  1. 2008-03-11- I276926A452        LEVEL: Error
  2. PID     : 18727                TID  : 26          PROC : db2acd 0
  3. INSTANCE: db2                  NODE : 000
  4. APPID   : *LOCAL.db2.080311150558
  5. FUNCTION: DB2 UDB, Automatic Table Maintenance, db2HmonEvalReorg, probe:515
  6. MESSAGE : Failed to evaluate table "NEAT    "."FLIGHT_FARE_LOG". Probe 400
  7. DATA #1 : Hexdump, 4 bytes
  8. 0xFFFFFFFF6F53F470 : FFFF F6FE                                  ....
Mar 14 '08 #1
5 14780
297 Expert 100+
Hi jbenner,

thank you very much for sharing your experiences with the health monitor. I'm sure this may help some people out there. I for myself am still looking forward to migrate to v9.1. ;-) By the way, I took the liberty to add some code tags (to be set with the # button in the editor) to the logs for better readability.


Mar 14 '08 #2
You are welcome! Thanks for helping me with the post readability.
Mar 25 '08 #3
Thanks for this information on Health Monitor!

We are currently migrating from DB2 v8 on Solaris 8 to DB2 v9.1 on Solaris 10 and at the same time exploit the virtualization features of Solaris 10 (Zone concept). We first wanted to directly use DB2 9.5 but we ran into several problems. Thus we decided to go for v9.1, Fixpack 4. Still quite a step!

What makes things a bit more tricky is that for development and education platforms, we intend to use Sun T5120 servers. For these ones, we ran into nasty problems including intermittent instance crashes. It took us much time, firmware updates and fixes from Sund and IBM until things started running smoother. Please be aware of such problems if you intend to use this new Sun hardware for DB2.

We encounter the same problems with health monitor to what concerns the db2HmonEvalReorg error on certain tables. We have no clue yet why health monitor can evaluate some of the tables and others not.

We haven't used health monitor up to now, but intend to do so for the new version. Therefore it's good to know that it may cause such heavy performance problems - we can keep an eye on this.

And of course I will appreciate any hint on which health indicators are critical in terms of resource consumption.

Apr 1 '08 #4
While we are on the subject ... When in DB2 V9 a database is explicitly activated, two connections show up by default. One is STMM and the other I do not remember what. Do you know how to turn off both. You already mentioned that you turned off STMM. How did you do that? Thx
Apr 5 '08 #5

Regarding the new db2 agents...i.e.db2stmm and db2taskd..
New DB2 agents have been introduced in DB2 Version 9.1: db2stmm and db2taskd. The db2stmm agent is part of the new self-tuning memory feature added in DB2 Version 9.1. The db2taskd agent is an internal daemon that distributes background database tasks. These new agents remain connected to the database at all times and are not active if the database is activated in exclusive mode.
Starting with Fix Pack 3, agents for active event monitors are also connected to new databases by default.

Regarding disabling of STMM...
Self tuning can be disabled for the entire database by setting self_tuning_mem to OFF. When self_tuning_mem is set to OFF, the memory configuration parameters and buffer pools that are set to AUTOMATIC remain AUTOMATIC and the memory areas remain at their current size.

You can set self_tuning_mem to OFF using the UPDATE DATABASE CONFIGURATION command, the SQLFUPD API, or through the Change Database Configuration Parameter window in the Control Center.

Self tuning can also be effectively deactivated for the entire database if only a single memory consumer is enabled for self tuning. This is because memory cannot be redistributed when only one memory area is enabled.

Shashank Kharche
DB2 for Linux, Unix & Windows -Information Management Software
Apr 15 '08 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

by: Bruce MacDonald | last post by:
I am unable to see the health monitor SQL table functions (e.g.: SQLPROC.HEALTH_CONT_HI). Have I missed a utility which registers the functions? DB2 WR21324. Another point about the...
by: Debbie E | last post by:
I am trying to setup the DB2 Health Monitor and so far I am not getting any e-mail notifications. I've done the following: db2 update dbm cfg using health_mon on db2 update admin cfg using...
by: Debbie E | last post by:
Are there any issues with the Health Monitor in a partitioned environment? I have turned on the health monitor and can obtain the health snapshot for the DBM via the following command: get...
by: Mike | last post by:
Hellos again, I seem to be having a weird issue. Whenever DB2 Health monitor seems to run, all connections to the database seem to just lock up. For example: (and this seems consistent) In list...
by: zahni31 | last post by:
I had trouble to get the hmon work in 8.x FP 8 and FP 9 . The server runnig Windows 2003 SP1 on a member server. The db2 services are running under an domain admin account. Ok, no problem so...
by: nycjay | last post by:
i am trying to use the db2 health monitor alert to run a script. i set up the health monitor to run a script when the tablespace utailization is too high. in the end, the script should send a...
by: oddbande | last post by:
I get a lot of these messages in db2diag.log - seems like problems with health Monitor: 2007-01-05- I260759H331 LEVEL: Severe PID : 3860 TID : 4004...
by: Cage | last post by:
hi,all As we known,DB2 Health Center is a very good tool,but it made me a little confusion. I will be very appreciate if anyone give me some advices. The problem is that I want to monitor my...
by: Odd B Andersen | last post by:
I get a lot of these severe errors in db2diag.log: 2008-10-23- I1463721H437 LEVEL: Severe PID : 3960 TID : 4232 PROC : db2fmp.exe INSTANCE:...
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.