By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,590 Members | 2,174 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,590 IT Pros & Developers. It's quick & easy.

Odd query hang

P: n/a
Doozie of a problem here - may open a ticket with support, but wanted
to see what others had to say first...

We have an AIX 5.2 server with DB2 V8 FP5 and 32-bit instance. ~45GB
db. Reorgs and runstats are done weekly and the server is at least
decently tuned, although we're bumping up against the memory limits
with 32-bit. There is a series of 16 queries run in varying orders by
the users referred to as "mega-measures". Normally these queries
return their results (currently 9,126 rows) in a couple of minutes or
less each. Starting Sunday, some of them started to hang in a
non-uniform pattern. I am currently working with 3 queries on a test
system. Each of the queries runs successfully when run individually
(using db2 -tvf on a script). When all three are run in one of these
orders (query1, query2, query3) or (query3, query1, query2) in the
same script, the query that runs last hangs (see below for what I mean
by "hangs" in this case). I have user reports of these queries being
run individually hanging, but have not been able to reproduce that. If
I run the queries in this order (query3, query2, query1), they all
suceed. Also if I run the queries in any order from seperate
scripts(even within the same connection), they all succeed. These
results are re-creatable even after the db has been restarted, and
happen on both test and production (different hardware).

Changes to the system in the last week: APP_CTL_HEAP_SZ increased from
512 to 4096, LOCKLIST increased from 4096 to 8192, MAXFILOP increased
from 512 to 1024, TXNISOLATION cli config parameter changed from
default to 1(i.e CLI bound with UR). I cannot replicate the problem on
a system that does not have these 4 changes, but is identical in other
ways except it has a bit more data.

Definition of query hang in this situation: Query does not return,
even if left to run for 8 hours. No errors are returned or written to
the db2diag.log. List applications shows the application in an
"executing" status, but snapshots show no change in activity over the
course of time. No unusual CPU, Memory, or IO activity is aparent.

So - is it related to one of the 4 changes I made? If so, which one,
and why? If not, what the heck is going on here?

Thanks,

Ember Crooks
IBM Certified Database Administrator
DB2 UDB V8.1 for Linux, Unix, and Windows
Nov 12 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
It seems to be a lock contention issue...since the problem is
reproducable only for specific order of execution, I would suggest
taking snapshots for locks for the affected database and have a look
at the output...

da****@hotmail.com (Ember) wrote in message news:<7c*************************@posting.google.c om>...
Doozie of a problem here - may open a ticket with support, but wanted
to see what others had to say first...

We have an AIX 5.2 server with DB2 V8 FP5 and 32-bit instance. ~45GB
db. Reorgs and runstats are done weekly and the server is at least
decently tuned, although we're bumping up against the memory limits
with 32-bit. There is a series of 16 queries run in varying orders by
the users referred to as "mega-measures". Normally these queries
return their results (currently 9,126 rows) in a couple of minutes or
less each. Starting Sunday, some of them started to hang in a
non-uniform pattern. I am currently working with 3 queries on a test
system. Each of the queries runs successfully when run individually
(using db2 -tvf on a script). When all three are run in one of these
orders (query1, query2, query3) or (query3, query1, query2) in the
same script, the query that runs last hangs (see below for what I mean
by "hangs" in this case). I have user reports of these queries being
run individually hanging, but have not been able to reproduce that. If
I run the queries in this order (query3, query2, query1), they all
suceed. Also if I run the queries in any order from seperate
scripts(even within the same connection), they all succeed. These
results are re-creatable even after the db has been restarted, and
happen on both test and production (different hardware).

Changes to the system in the last week: APP_CTL_HEAP_SZ increased from
512 to 4096, LOCKLIST increased from 4096 to 8192, MAXFILOP increased
from 512 to 1024, TXNISOLATION cli config parameter changed from
default to 1(i.e CLI bound with UR). I cannot replicate the problem on
a system that does not have these 4 changes, but is identical in other
ways except it has a bit more data.

Definition of query hang in this situation: Query does not return,
even if left to run for 8 hours. No errors are returned or written to
the db2diag.log. List applications shows the application in an
"executing" status, but snapshots show no change in activity over the
course of time. No unusual CPU, Memory, or IO activity is aparent.

So - is it related to one of the 4 changes I made? If so, which one,
and why? If not, what the heck is going on here?

Thanks,

Ember Crooks
IBM Certified Database Administrator
DB2 UDB V8.1 for Linux, Unix, and Windows

Nov 12 '05 #2

P: n/a
That's what I thought at first too - that's why I tried adding commits
and adding "for read only with ur" to the queries, and neither helped.
When I took and looked at lock snapshots in between the queries when I
was the only one on the system, It reported 0 locks currently held,
and the problem still occured. Also locktimeout is set to 60, so I
know it's not a lock wait (plus the app is in 'executing' status).

I let it run for 12 hours last night and it still didn't return.

I did verify with the customer - they indicated this is not a new
problem - it just seemed intermittent when they were in system test,
since the order of the queries matters, and they run them in varying
orders. So the problem has probably been there all along - since the
system was built and the queries were first tried.

Thanks!
sa**************@marlabs.com (Passcore) wrote in message news:<ac**************************@posting.google. com>...
It seems to be a lock contention issue...since the problem is
reproducable only for specific order of execution, I would suggest
taking snapshots for locks for the affected database and have a look
at the output...

da****@hotmail.com (Ember) wrote in message news:<7c*************************@posting.google.c om>...
Doozie of a problem here - may open a ticket with support, but wanted
to see what others had to say first...

We have an AIX 5.2 server with DB2 V8 FP5 and 32-bit instance. ~45GB
db. Reorgs and runstats are done weekly and the server is at least
decently tuned, although we're bumping up against the memory limits
with 32-bit. There is a series of 16 queries run in varying orders by
the users referred to as "mega-measures". Normally these queries
return their results (currently 9,126 rows) in a couple of minutes or
less each. Starting Sunday, some of them started to hang in a
non-uniform pattern. I am currently working with 3 queries on a test
system. Each of the queries runs successfully when run individually
(using db2 -tvf on a script). When all three are run in one of these
orders (query1, query2, query3) or (query3, query1, query2) in the
same script, the query that runs last hangs (see below for what I mean
by "hangs" in this case). I have user reports of these queries being
run individually hanging, but have not been able to reproduce that. If
I run the queries in this order (query3, query2, query1), they all
suceed. Also if I run the queries in any order from seperate
scripts(even within the same connection), they all succeed. These
results are re-creatable even after the db has been restarted, and
happen on both test and production (different hardware).

Changes to the system in the last week: APP_CTL_HEAP_SZ increased from
512 to 4096, LOCKLIST increased from 4096 to 8192, MAXFILOP increased
from 512 to 1024, TXNISOLATION cli config parameter changed from
default to 1(i.e CLI bound with UR). I cannot replicate the problem on
a system that does not have these 4 changes, but is identical in other
ways except it has a bit more data.

Definition of query hang in this situation: Query does not return,
even if left to run for 8 hours. No errors are returned or written to
the db2diag.log. List applications shows the application in an
"executing" status, but snapshots show no change in activity over the
course of time. No unusual CPU, Memory, or IO activity is aparent.

So - is it related to one of the 4 changes I made? If so, which one,
and why? If not, what the heck is going on here?

Thanks,

Ember Crooks
IBM Certified Database Administrator
DB2 UDB V8.1 for Linux, Unix, and Windows

Nov 12 '05 #3

P: n/a
Turns out to be APAR IY54318 - a problem with the same (or in this
case very similar) queries executed one after another, and traces back
to a problem with communication between the subagents when
INTRA_PARALLEL is on. Turning INTRA_PARALELL off or using a degree of
parallelism of 1 for the queries is the short term workaround. The fix
is apparently an architectural one, and will be available in db2 8.2 -
db2 support has yet to get back to me on whether it is possible to get
this fix for 8.1.

Thanks for your input!
da****@hotmail.com (Ember) wrote in message news:<7c*************************@posting.google.c om>...
Doozie of a problem here - may open a ticket with support, but wanted
to see what others had to say first...

We have an AIX 5.2 server with DB2 V8 FP5 and 32-bit instance. ~45GB
db. Reorgs and runstats are done weekly and the server is at least
decently tuned, although we're bumping up against the memory limits
with 32-bit. There is a series of 16 queries run in varying orders by
the users referred to as "mega-measures". Normally these queries
return their results (currently 9,126 rows) in a couple of minutes or
less each. Starting Sunday, some of them started to hang in a
non-uniform pattern. I am currently working with 3 queries on a test
system. Each of the queries runs successfully when run individually
(using db2 -tvf on a script). When all three are run in one of these
orders (query1, query2, query3) or (query3, query1, query2) in the
same script, the query that runs last hangs (see below for what I mean
by "hangs" in this case). I have user reports of these queries being
run individually hanging, but have not been able to reproduce that. If
I run the queries in this order (query3, query2, query1), they all
suceed. Also if I run the queries in any order from seperate
scripts(even within the same connection), they all succeed. These
results are re-creatable even after the db has been restarted, and
happen on both test and production (different hardware).

Changes to the system in the last week: APP_CTL_HEAP_SZ increased from
512 to 4096, LOCKLIST increased from 4096 to 8192, MAXFILOP increased
from 512 to 1024, TXNISOLATION cli config parameter changed from
default to 1(i.e CLI bound with UR). I cannot replicate the problem on
a system that does not have these 4 changes, but is identical in other
ways except it has a bit more data.

Definition of query hang in this situation: Query does not return,
even if left to run for 8 hours. No errors are returned or written to
the db2diag.log. List applications shows the application in an
"executing" status, but snapshots show no change in activity over the
course of time. No unusual CPU, Memory, or IO activity is aparent.

So - is it related to one of the 4 changes I made? If so, which one,
and why? If not, what the heck is going on here?

Thanks,

Ember Crooks
IBM Certified Database Administrator
DB2 UDB V8.1 for Linux, Unix, and Windows

Nov 12 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.