473,890 Members | 1,391 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

DB2 Health Monitor v9 fixpack 4 still can be a BIG PROBLEM

2 New Member
I have opened a PMR for this with IBM, and am not asking for advice from the DB2 DBA community. I am posting this as an FYI that DB2 Health Monitor, even at the latest version of DB2, still can cause huge problems with slow connect times and heavy resource locking in high concurrency / high transaction volume environments. I have an OLTP with 30-90 transactions per second activity, and start of Health Monitor every 2 hours was crashing our application. We only resolved the situation by setting DBM CFG parameter HEALTH_MON = OFF.

Note that other thescripts reports show problems with db2hmon going back to version 8.2.

My advice for high transaction volume databases is to not even think about leaving Health Monitor active. (I also see other autonomous features such as STMM giving undesirable effects, so we have deactivated self tuning memory manager as well.).

Here is more detail, from the doc I sent IBM on 2008.03.14:

======

Summary

We are a DB2 version 9 Fixpack 4 user, 64-bit, running on a Solaris 10 zone on a T2000. The production application is very high transaction volume, 20-90 transactions per second.

We have experienced three separate errors with DB2 Health Monitor that have forced us to deactivate it entirely. To summarize they are:

1. In high transaction environment, every time DB2 Health Monitor runs (every 2 hours) causes heavy new connection slowdown and heavy table locking.
2. In one server, DB2 Health Monitor crashes every time it runs with a sqlerRemoveAllI PCforRow violation. When it crashes it causes heavy connection slowdowns and heavy table locking which result in errors for the heavy concurrency user application.
3. DB2 Health Monitor CALL REORGCHK_IX_STA TS fails with a SQLCODE “-2306” even though the table exists.

Detail

Problem #1 Health Monitor causes heavy locking and new connections slowdown

We originally opened this PMR to request IBM’s help in determining why our high transaction volume application was experiencing strong connection slowdowns every 2 hours. IBM's engineer said that it did not seem to be a DB2 process that was causing the problem. I did decide to fix MAXLOCKS and that made a marginal improvement, but the problem continued.

Yesterday, realizing that DB2HMON runs every 2 hours, we deactivated DB2HMON ( UPDATE DBM CFG USING HEALTH_MON OFF ) and the connection slowdown/locking related application failures immediately ended. It is obvious that the problem was entirely caused by Health Monitor.

Problem #2 Health Monitor crashes with IPC Resource Error

On the staging server (NOT the production server), every time DB2HMON attempts to run, it crashes. As a result we have decided to completely prevent DB2HMON from starting, using IBM's advice ( http://www-1.ibm.com/support/docview...id=swg21259046 ) , by setting the DB2_FMP_COMM_HE APSZ to 0. Despite the fact that we are not using Health Monitor, we would like IBM to fix this crash behavior.

Here is an example from today’s db2diag.log on the staging database:


Expand|Select|Wrap|Line Numbers
  1. 2008-03-13-00.18.44.059066-300 E876A408           LEVEL: Warning
  2. PID     : 10499                TID  : 1           PROC : db2sysc 0
  3. INSTANCE: db2                  NODE : 000
  4. FUNCTION: DB2 UDB, routine_infrastructure, sqlerReturnFmpToPool, probe:999
  5. DATA #1 : String, 22 bytes
  6. Removing FMP from pool
  7. DATA #2 : Hexdump, 16 bytes
  8. 0xFFFFFFFF7FFFE160 : 0000 0000 0000 0000 0000 084F 0000 0000    ...........O....
  9.  
  10. 2008-03-13-00.18.44.076838-300 E1285A456          LEVEL: Error
  11. PID     : 10499                TID  : 1           PROC : db2sysc 0
  12. INSTANCE: db2                  NODE : 000
  13. FUNCTION: DB2 UDB, routine_infrastructure, sqlerRemoveAllIPCforRow, probe:10
  14. DATA #1 : String, 32 bytes
  15. Freeing IPC resource explicitly:
  16. DATA #2 : SQLO_PID, PD_TYPE_SQLO_PID, 4 bytes
  17. 2127
  18. DATA #3 : Hexdump, 4 bytes
  19. 0x0000000200447FE0 : 0000 0000                                  ....
  20.  
  21. 2008-03-13-00.18.44.077523-300 E1742A345          LEVEL: Error
  22. PID     : 10499                TID  : 1           PROC : db2sysc 0
  23. INSTANCE: db2                  NODE : 000
  24. FUNCTION: DB2 UDB, routine_infrastructure, sqlerRemoveAllIPCforRow, probe:20
  25. DATA #1 : String, 22 bytes
  26. IPC resources Address:
  27. DATA #2 : Pointer, 8 bytes
  28. 0x0000000210010080
  29.  
  30. 2008-03-13-00.18.44.078221-300 E2088A1046         LEVEL: Error
  31. PID     : 10499                TID  : 1           PROC : db2sysc 0
  32. INSTANCE: db2                  NODE : 000
  33. FUNCTION: DB2 UDB, routine_infrastructure, sqlerRemoveAllIPCforRow, probe:30
  34. DATA #1 : String, 29 bytes
  35. Number of IPC resource found:
  36. DATA #2 : signed integer, 4 bytes
  37. 1
  38. DATA #3 : String, 29 bytes
  39. Number of IPC resource freed:
  40. DATA #4 : signed integer, 4 bytes
  41. 1
  42. CALLSTCK:
  43.   [0] 0xFFFFFFFF7C88B804 __1cXsqlerRemoveAllIPCforRow6FpnLsqlerFmpRow_b_i_ + 0x89C
  44.   [1] 0xFFFFFFFF7C881A68 __1cXsqlerRemoveFmpFromTable6FpnLsqlerFmpRow_b_i_ + 0x220
  45.   [2] 0xFFFFFFFF7C884A6C __1cUsqlerReturnFmpToPool6FccpnOsqlerFmpHandle_pnNsqle_agent_cb__i_ + 0x15DC
  46.   [3] 0x000000010000AD9C __1cOsqleRunSysCtlr6F_i_ + 0x3D4
  47.   [4] 0x0000000100006A64 __1cLsqleSysCtlr6F_i_ + 0x384
  48.   [5] 0xFFFFFFFF7AF75CDC __1cHDBGTerm6F_i_ + 0x1F94
  49.   [6] 0xFFFFFFFF7AF76984 sqloRunInstance + 0x5E4
  50.   [7] 0x0000000100006524 main + 0x924
  51.   [8] 0x0000000100004E5C _start + 0x17C
  52.   [9] 0x0000000000000000 ?unknown + 0x0
  53.  
  54. 2008-03-13-00.18.44.091581-300 I3135A433          LEVEL: Severe
  55. PID     : 10499                TID  : 1           PROC : db2sysc 0
  56. INSTANCE: db2                  NODE : 000
  57. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:15
  58. DATA #1 : Hexdump, 31 bytes
  59. 0x000000010000DB20 : 4865 616C 7468 204D 6F6E 6974 6F72 2050    Health Monitor P
  60. 0x000000010000DB30 : 726F 6365 7373 2063 7261 7368 6564 2E      rocess crashed.
  61.  
  62. 2008-03-13-00.18.44.092207-300 I3569A340          LEVEL: Severe
  63. PID     : 10499                TID  : 1           PROC : db2sysc 0
  64. INSTANCE: db2                  NODE : 000
  65. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:16
  66. DATA #1 : Hexdump, 4 bytes
  67. 0xFFFFFFFF7FFFE170 : 0000 084F                                  ...O
  68.  
  69. 2008-03-13-00.18.44.092769-300 I3910A340          LEVEL: Severe
  70. PID     : 10499                TID  : 1           PROC : db2sysc 0
  71. INSTANCE: db2                  NODE : 000
  72. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:17
  73. DATA #1 : Hexdump, 4 bytes
  74. 0xFFFFFFFF7FFFE174 : 0000 0101                                  ....
  75.  
  76. 2008-03-13-00.18.44.093349-300 I4251A340          LEVEL: Severe
  77. PID     : 10499                TID  : 1           PROC : db2sysc 0
  78. INSTANCE: db2                  NODE : 000
  79. FUNCTION: DB2 UDB, base sys utilities, sqleChildCrashHandler, probe:18
  80. DATA #1 : Hexdump, 4 bytes
  81. 0xFFFFFFFF7FFFE178 : FFFF FFFF                                  ....
  82.  
  83. 2008-03-13-00.18.44.110896-300 I4592A282          LEVEL: Warning
  84. PID     : 10499                TID  : 1           PROC : db2sysc 0
  85. INSTANCE: db2                  NODE : 000
  86. FUNCTION: DB2 UDB, base sys utilities, sqleRunSysCtlr, probe:63
  87. MESSAGE : Health Monitor Process restarted.
Problem #3 Health Monitor fails incorrectly with SQLCODE “-2306”


When I was running an event monitor to diagnose “Problem #1” above-referenced in this document, I noticed that Health Monitor was running CALL REORGCHK_IX_STA TS with a valid table name, for example:

CALL REORGCHK_IX_STA TS ('T', '"NEAT "."LOCATION_SET S_LOG"')

And returning SQLCODE -2306, “The table or index name does not exist” even though the table NEAT.LOCATION_S ETS_LOG does exist.

Here is event monitor output illustrating the problem:

Expand|Select|Wrap|Line Numbers
  1. 89214) Statement Event ...
  2.   Appl Handle: 587
  3.   Appl Id: *LOCAL.db2.080311150558
  4.   Appl Seq number: 00006
  5.  
  6.   Record is the result of a flush: FALSE
  7.   -------------------------------------------
  8.   Type     : Dynamic
  9.   Operation: Execute
  10.   Section  : 6
  11.   Creator  : NULLID
  12.   Package  : SYSSH200
  13.   Consistency Token  : SYSLVL01
  14.   Package Version ID  :
  15.   Cursor   : SQL_CURSH200C6
  16.   Cursor was blocking: FALSE
  17.   Text     : CALL REORGCHK_IX_STATS ('T', '"NEAT    "."FLIGHT_FARE_LOG"')
  18.   -------------------------------------------
  19.   Start Time: 03/11/2008 10:05:57.563826
  20.   Stop Time:  03/11/2008 10:05:57.713239
  21.   Elapsed Execution Time:  0.149413 seconds
  22.   Number of Agents created: 1
  23.   User CPU: 0.000000 seconds
  24.   System CPU: 0.000000 seconds
  25.   Fetch Count: 0
  26.   Sorts: 1
  27.   Total sort time: 0
  28.   Sort overflows: 0
  29.   Rows read: 0
  30.   Rows written: 0
  31.   Internal rows deleted: 0
  32.   Internal rows updated: 0
  33.   Internal rows inserted: 0
  34.   Bufferpool data logical reads: 0
  35.   Bufferpool data physical reads: 0
  36.   Bufferpool temporary data logical reads: 0
  37.   Bufferpool temporary data physical reads: 0
  38.   Bufferpool index logical reads: 0
  39.   Bufferpool index physical reads: 0
  40.   Bufferpool temporary index logical reads: 0
  41.   Bufferpool temporary index physical reads: 0
  42.   Bufferpool xda logical page reads: 0
  43.   Bufferpool xda physical page reads: 0
  44.   Bufferpool temporary xda logical page reads: 0
  45.   Bufferpool temporary xda physical page reads: 0
  46.   SQLCA:
  47.    sqlcode: -2306
  48.    sqlstate:
  49.  
The matching db2diag.log output for this event was:

Expand|Select|Wrap|Line Numbers
  1. 2008-03-11-10.05.57.713912-300 I276926A452        LEVEL: Error
  2. PID     : 18727                TID  : 26          PROC : db2acd 0
  3. INSTANCE: db2                  NODE : 000
  4. APPID   : *LOCAL.db2.080311150558
  5. FUNCTION: DB2 UDB, Automatic Table Maintenance, db2HmonEvalReorg, probe:515
  6. MESSAGE : Failed to evaluate table "NEAT    "."FLIGHT_FARE_LOG". Probe 400
  7. DATA #1 : Hexdump, 4 bytes
  8. 0xFFFFFFFF6F53F470 : FFFF F6FE                                  ....
Mar 14 '08 #1
5 14827
docdiesel
297 Recognized Expert Contributor
Hi jbenner,

thank you very much for sharing your experiences with the health monitor. I'm sure this may help some people out there. I for myself am still looking forward to migrate to v9.1. ;-) By the way, I took the liberty to add some code tags (to be set with the # button in the editor) to the logs for better readability.

Regards,

Bernd
Mar 14 '08 #2
jbenner
2 New Member
You are welcome! Thanks for helping me with the post readability.
Mar 25 '08 #3
dbpete
1 New Member
Thanks for this information on Health Monitor!

We are currently migrating from DB2 v8 on Solaris 8 to DB2 v9.1 on Solaris 10 and at the same time exploit the virtualization features of Solaris 10 (Zone concept). We first wanted to directly use DB2 9.5 but we ran into several problems. Thus we decided to go for v9.1, Fixpack 4. Still quite a step!

What makes things a bit more tricky is that for development and education platforms, we intend to use Sun T5120 servers. For these ones, we ran into nasty problems including intermittent instance crashes. It took us much time, firmware updates and fixes from Sund and IBM until things started running smoother. Please be aware of such problems if you intend to use this new Sun hardware for DB2.

We encounter the same problems with health monitor to what concerns the db2HmonEvalReor g error on certain tables. We have no clue yet why health monitor can evaluate some of the tables and others not.

We haven't used health monitor up to now, but intend to do so for the new version. Therefore it's good to know that it may cause such heavy performance problems - we can keep an eye on this.

And of course I will appreciate any hint on which health indicators are critical in terms of resource consumption.

Thanks,
pete
Apr 1 '08 #4
SuperKuper
7 New Member
While we are on the subject ... When in DB2 V9 a database is explicitly activated, two connections show up by default. One is STMM and the other I do not remember what. Do you know how to turn off both. You already mentioned that you turned off STMM. How did you do that? Thx
Apr 5 '08 #5
Shashank1984
26 New Member
Superkuper,

Regarding the new db2 agents...i.e.db 2stmm and db2taskd..
New DB2 agents have been introduced in DB2 Version 9.1: db2stmm and db2taskd. The db2stmm agent is part of the new self-tuning memory feature added in DB2 Version 9.1. The db2taskd agent is an internal daemon that distributes background database tasks. These new agents remain connected to the database at all times and are not active if the database is activated in exclusive mode.
Starting with Fix Pack 3, agents for active event monitors are also connected to new databases by default.

Regarding disabling of STMM...
Self tuning can be disabled for the entire database by setting self_tuning_mem to OFF. When self_tuning_mem is set to OFF, the memory configuration parameters and buffer pools that are set to AUTOMATIC remain AUTOMATIC and the memory areas remain at their current size.

You can set self_tuning_mem to OFF using the UPDATE DATABASE CONFIGURATION command, the SQLFUPD API, or through the Change Database Configuration Parameter window in the Control Center.

Self tuning can also be effectively deactivated for the entire database if only a single memory consumer is enabled for self tuning. This is because memory cannot be redistributed when only one memory area is enabled.

Thanks,
Shashank Kharche
IBM.
DB2 for Linux, Unix & Windows -Information Management Software
Apr 15 '08 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

7
2361
by: Bruce MacDonald | last post by:
I am unable to see the health monitor SQL table functions (e.g.: SQLPROC.HEALTH_CONT_HI). Have I missed a utility which registers the functions? DB2 8.1.3.132 WR21324. Another point about the health monitor - when I get notification messages all they say is that a particular instance/db has a warning or alarm. I have to open the health centre to see what the actual warning / alarm is. I would much prefer if the email had at least a...
3
5144
by: Debbie E | last post by:
I am trying to setup the DB2 Health Monitor and so far I am not getting any e-mail notifications. I've done the following: db2 update dbm cfg using health_mon on db2 update admin cfg using smtp_server lresa8a002.xxx.yyy.zzz db2 update notification list add contact debbie
2
2070
by: Debbie E | last post by:
Are there any issues with the Health Monitor in a partitioned environment? I have turned on the health monitor and can obtain the health snapshot for the DBM via the following command: get health snapshot for dbm But I can't seem to get any info regarding the database itself. I tried the following commands:
2
6241
by: Mike | last post by:
Hellos again, I seem to be having a weird issue. Whenever DB2 Health monitor seems to run, all connections to the database seem to just lock up. For example: (and this seems consistent) In list applications, DB2Hmon will show as "compiling statement", If I issue a "connect to database"it will hang there until DB2HMON is completed and no longer showing as a active application. Below is a lock snapshot that occurs when this is happening....
1
2407
by: zahni31 | last post by:
I had trouble to get the hmon work in 8.x FP 8 and FP 9 . The server runnig Windows 2003 SP1 on a member server. The db2 services are running under an domain admin account. Ok, no problem so far (HMON working) . Now i had made a special local group a the member server (contains both local and domain admin accounts and some local accounts without local
0
3501
by: nycjay | last post by:
i am trying to use the db2 health monitor alert to run a script. i set up the health monitor to run a script when the tablespace utailization is too high. in the end, the script should send a syslog message. for now, i am just trying to figure out what message db2 is sending, by printing out the parameters db2 is sending. set up: Indicator Name = ts.ts_util Default = No
1
2784
by: oddbande | last post by:
I get a lot of these messages in db2diag.log - seems like problems with health Monitor: 2007-01-05-15.05.06.380000+060 I260759H331 LEVEL: Severe PID : 3860 TID : 4004 PROC : db2fmp.exe INSTANCE: DB2 NODE : 000 APPID : *LOCAL.DB2.070104233148 FUNCTION: DB2 UDB, Health Monitor, hmonSQLConnect, probe:90 MESSAGE : SQLSetConnectAttr error sqlRC= sqlcode=
0
1688
by: Cage | last post by:
hi,all As we known,DB2 Health Center is a very good tool,but it made me a little confusion. I will be very appreciate if anyone give me some advices. The problem is that I want to monitor my tablespacle utilization,then I did some test work: 1.I create a tablespace "TESTDMS"managed by database, db2=>CREATE REGULAR TABLESPACE TESTDMS PAGESIZE 4 K MANAGED BY DATABASE USING ( FILE 'D:\test\test' 5120 ) EXTENTSIZE 16 OVERHEAD 10.5 PREFETCHSIZE...
1
2751
by: Odd B Andersen | last post by:
I get a lot of these severe errors in db2diag.log: 2008-10-23-14.11.33.865000+120 I1463721H437 LEVEL: Severe PID : 3960 TID : 4232 PROC : db2fmp.exe INSTANCE: DB2 NODE : 000 APPID : *LOCAL.DB2.081023121124 EDUID : 4232 FUNCTION: DB2 UDB, Health Monitor, sqlmhMonCollHistReadHeader, probe:20 MESSAGE : Invalid history version in header DATA #1 : Hexdump, 4 bytes
0
10829
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10468
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9640
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8018
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7171
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
6058
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4682
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4276
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3282
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.