|
Doozie of a problem here - may open a ticket with support, but wanted
to see what others had to say first...
We have an AIX 5.2 server with DB2 V8 FP5 and 32-bit instance. ~45GB
db. Reorgs and runstats are done weekly and the server is at least
decently tuned, although we're bumping up against the memory limits
with 32-bit. There is a series of 16 queries run in varying orders by
the users referred to as "mega-measures". Normally these queries
return their results (currently 9,126 rows) in a couple of minutes or
less each. Starting Sunday, some of them started to hang in a
non-uniform pattern. I am currently working with 3 queries on a test
system. Each of the queries runs successfully when run individually
(using db2 -tvf on a script). When all three are run in one of these
orders (query1, query2, query3) or (query3, query1, query2) in the
same script, the query that runs last hangs (see below for what I mean
by "hangs" in this case). I have user reports of these queries being
run individually hanging, but have not been able to reproduce that. If
I run the queries in this order (query3, query2, query1), they all
suceed. Also if I run the queries in any order from seperate
scripts(even within the same connection), they all succeed. These
results are re-creatable even after the db has been restarted, and
happen on both test and production (different hardware).
Changes to the system in the last week: APP_CTL_HEAP_SZ increased from
512 to 4096, LOCKLIST increased from 4096 to 8192, MAXFILOP increased
from 512 to 1024, TXNISOLATION cli config parameter changed from
default to 1(i.e CLI bound with UR). I cannot replicate the problem on
a system that does not have these 4 changes, but is identical in other
ways except it has a bit more data.
Definition of query hang in this situation: Query does not return,
even if left to run for 8 hours. No errors are returned or written to
the db2diag.log. List applications shows the application in an
"executing" status, but snapshots show no change in activity over the
course of time. No unusual CPU, Memory, or IO activity is aparent.
So - is it related to one of the 4 changes I made? If so, which one,
and why? If not, what the heck is going on here?
Thanks,
Ember Crooks
IBM Certified Database Administrator
DB2 UDB V8.1 for Linux, Unix, and Windows | |
Share:
|
It seems to be a lock contention issue...since the problem is
reproducable only for specific order of execution, I would suggest
taking snapshots for locks for the affected database and have a look
at the output... da****@hotmail.com (Ember) wrote in message news:<7c*************************@posting.google.c om>... Doozie of a problem here - may open a ticket with support, but wanted to see what others had to say first...
We have an AIX 5.2 server with DB2 V8 FP5 and 32-bit instance. ~45GB db. Reorgs and runstats are done weekly and the server is at least decently tuned, although we're bumping up against the memory limits with 32-bit. There is a series of 16 queries run in varying orders by the users referred to as "mega-measures". Normally these queries return their results (currently 9,126 rows) in a couple of minutes or less each. Starting Sunday, some of them started to hang in a non-uniform pattern. I am currently working with 3 queries on a test system. Each of the queries runs successfully when run individually (using db2 -tvf on a script). When all three are run in one of these orders (query1, query2, query3) or (query3, query1, query2) in the same script, the query that runs last hangs (see below for what I mean by "hangs" in this case). I have user reports of these queries being run individually hanging, but have not been able to reproduce that. If I run the queries in this order (query3, query2, query1), they all suceed. Also if I run the queries in any order from seperate scripts(even within the same connection), they all succeed. These results are re-creatable even after the db has been restarted, and happen on both test and production (different hardware).
Changes to the system in the last week: APP_CTL_HEAP_SZ increased from 512 to 4096, LOCKLIST increased from 4096 to 8192, MAXFILOP increased from 512 to 1024, TXNISOLATION cli config parameter changed from default to 1(i.e CLI bound with UR). I cannot replicate the problem on a system that does not have these 4 changes, but is identical in other ways except it has a bit more data.
Definition of query hang in this situation: Query does not return, even if left to run for 8 hours. No errors are returned or written to the db2diag.log. List applications shows the application in an "executing" status, but snapshots show no change in activity over the course of time. No unusual CPU, Memory, or IO activity is aparent.
So - is it related to one of the 4 changes I made? If so, which one, and why? If not, what the heck is going on here?
Thanks,
Ember Crooks IBM Certified Database Administrator DB2 UDB V8.1 for Linux, Unix, and Windows | | |
That's what I thought at first too - that's why I tried adding commits
and adding "for read only with ur" to the queries, and neither helped.
When I took and looked at lock snapshots in between the queries when I
was the only one on the system, It reported 0 locks currently held,
and the problem still occured. Also locktimeout is set to 60, so I
know it's not a lock wait (plus the app is in 'executing' status).
I let it run for 12 hours last night and it still didn't return.
I did verify with the customer - they indicated this is not a new
problem - it just seemed intermittent when they were in system test,
since the order of the queries matters, and they run them in varying
orders. So the problem has probably been there all along - since the
system was built and the queries were first tried.
Thanks! sa**************@marlabs.com (Passcore) wrote in message news:<ac**************************@posting.google. com>... It seems to be a lock contention issue...since the problem is reproducable only for specific order of execution, I would suggest taking snapshots for locks for the affected database and have a look at the output...
da****@hotmail.com (Ember) wrote in message news:<7c*************************@posting.google.c om>... Doozie of a problem here - may open a ticket with support, but wanted to see what others had to say first...
We have an AIX 5.2 server with DB2 V8 FP5 and 32-bit instance. ~45GB db. Reorgs and runstats are done weekly and the server is at least decently tuned, although we're bumping up against the memory limits with 32-bit. There is a series of 16 queries run in varying orders by the users referred to as "mega-measures". Normally these queries return their results (currently 9,126 rows) in a couple of minutes or less each. Starting Sunday, some of them started to hang in a non-uniform pattern. I am currently working with 3 queries on a test system. Each of the queries runs successfully when run individually (using db2 -tvf on a script). When all three are run in one of these orders (query1, query2, query3) or (query3, query1, query2) in the same script, the query that runs last hangs (see below for what I mean by "hangs" in this case). I have user reports of these queries being run individually hanging, but have not been able to reproduce that. If I run the queries in this order (query3, query2, query1), they all suceed. Also if I run the queries in any order from seperate scripts(even within the same connection), they all succeed. These results are re-creatable even after the db has been restarted, and happen on both test and production (different hardware).
Changes to the system in the last week: APP_CTL_HEAP_SZ increased from 512 to 4096, LOCKLIST increased from 4096 to 8192, MAXFILOP increased from 512 to 1024, TXNISOLATION cli config parameter changed from default to 1(i.e CLI bound with UR). I cannot replicate the problem on a system that does not have these 4 changes, but is identical in other ways except it has a bit more data.
Definition of query hang in this situation: Query does not return, even if left to run for 8 hours. No errors are returned or written to the db2diag.log. List applications shows the application in an "executing" status, but snapshots show no change in activity over the course of time. No unusual CPU, Memory, or IO activity is aparent.
So - is it related to one of the 4 changes I made? If so, which one, and why? If not, what the heck is going on here?
Thanks,
Ember Crooks IBM Certified Database Administrator DB2 UDB V8.1 for Linux, Unix, and Windows | | |
Turns out to be APAR IY54318 - a problem with the same (or in this
case very similar) queries executed one after another, and traces back
to a problem with communication between the subagents when
INTRA_PARALLEL is on. Turning INTRA_PARALELL off or using a degree of
parallelism of 1 for the queries is the short term workaround. The fix
is apparently an architectural one, and will be available in db2 8.2 -
db2 support has yet to get back to me on whether it is possible to get
this fix for 8.1.
Thanks for your input! da****@hotmail.com (Ember) wrote in message news:<7c*************************@posting.google.c om>... Doozie of a problem here - may open a ticket with support, but wanted to see what others had to say first...
We have an AIX 5.2 server with DB2 V8 FP5 and 32-bit instance. ~45GB db. Reorgs and runstats are done weekly and the server is at least decently tuned, although we're bumping up against the memory limits with 32-bit. There is a series of 16 queries run in varying orders by the users referred to as "mega-measures". Normally these queries return their results (currently 9,126 rows) in a couple of minutes or less each. Starting Sunday, some of them started to hang in a non-uniform pattern. I am currently working with 3 queries on a test system. Each of the queries runs successfully when run individually (using db2 -tvf on a script). When all three are run in one of these orders (query1, query2, query3) or (query3, query1, query2) in the same script, the query that runs last hangs (see below for what I mean by "hangs" in this case). I have user reports of these queries being run individually hanging, but have not been able to reproduce that. If I run the queries in this order (query3, query2, query1), they all suceed. Also if I run the queries in any order from seperate scripts(even within the same connection), they all succeed. These results are re-creatable even after the db has been restarted, and happen on both test and production (different hardware).
Changes to the system in the last week: APP_CTL_HEAP_SZ increased from 512 to 4096, LOCKLIST increased from 4096 to 8192, MAXFILOP increased from 512 to 1024, TXNISOLATION cli config parameter changed from default to 1(i.e CLI bound with UR). I cannot replicate the problem on a system that does not have these 4 changes, but is identical in other ways except it has a bit more data.
Definition of query hang in this situation: Query does not return, even if left to run for 8 hours. No errors are returned or written to the db2diag.log. List applications shows the application in an "executing" status, but snapshots show no change in activity over the course of time. No unusual CPU, Memory, or IO activity is aparent.
So - is it related to one of the 4 changes I made? If so, which one, and why? If not, what the heck is going on here?
Thanks,
Ember Crooks IBM Certified Database Administrator DB2 UDB V8.1 for Linux, Unix, and Windows | | This discussion thread is closed Replies have been disabled for this discussion. Similar topics
1 post
views
Thread by Ryan Govostes |
last post: by
|
4 posts
views
Thread by Madhu Gopinathan |
last post: by
|
4 posts
views
Thread by darrel |
last post: by
|
1 post
views
Thread by Ron |
last post: by
|
5 posts
views
Thread by marcsirois |
last post: by
|
7 posts
views
Thread by Jlo |
last post: by
|
12 posts
views
Thread by bhipwell via AccessMonster.com |
last post: by
|
10 posts
views
Thread by Lloyd Harold |
last post: by
| | | | | | | | | | | |