Hi,
we have some major GC issues at present with a system we're trying to
put live. It's a live calculation engine that's distributed across
about 30 Java server processes. A set of processes called
objectservers sit between the core infra and a Sybase DB. Two of the
main objectservers are set up with 2Gb Heaps as they need to cache
large amounts of objects and accept 500-600 incoming RMI connections
when heavily loaded. We're running the core servers on an E4500 with
12/336Mhz CPUs and 12Gb of Mem and we're not really hammering the box,
even tho the Java processes appear to be squeezed when it comes to
these GCs we're still 60-70% idle (the two main ObjectServers use
between 6% and 15%)
What I'm seeing is available heap (calculated from
Runtime.getRuntime().totalMemory() and
Runtime.getRuntime().freeMemory()) around 1Gb out of an allocated 2Gb.
It drops over time to around 600-700Mb free before being released back
up to around 1Gb. However, some of the GCs lock the process completely
for 80secs+ and don't release any more memory than the intermediate
GCs that don't lock the process.
I realize that the FullGC will stop the world whilst it's dong it's
bit but I don't understand what's really happening here. How can we
reduce this impact ? Is this normal ? Is there something obvious from
the GC logs below that show the problem ?
I realise I'll get answers such as 2Gb, that's way to big, you need to
distribute more, of course it's going to take 80secs to FullGC etc
etc...but the JVM should be scalable to manage large heaps and my
issue is that half the time, the GC frees up enough memory to continue
without the need for this stop the world GC (if we can disable
it/reconfigure it correctly so that it would stay responsive for a
day, we'd have the opportunity to recycle it that evening). It doesn't
appear that we're running out of heap, since there's consistently at
least 500Mb free, so I don't think there's a memory leak here unless
anyone can point to one from the logs.
We've tried several tweeks to the GC settings but this is the current
set up :
-server -Xms512M -Xmx2176M -Xss256k -XX:MaxPermSize=24M
-XX:PermSize=24M -XX:+UseConcMarkSweepGC
-XX:+UseCMSCompactAtFullCollection
-XX:CMSInitiatingOccupancyFraction=68 -XX:NewRatio=5
-XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC
We're also setting ulimit :
ULIMIT=ulimit -c 10 ; ulimit -n 2024 ; sleep 2
(This is actually 1.3.1_04 compiled running under a 1.4.1_03 JRE due
to the need for the new GC options but legacy requirement to use 1.3.1
in production for another server that shares serialiable objects)
I've pasted a log extract of the GCs with some comments on what I see
from hitting the process..
Any ideas much appreciated.
thanks,
Ant.
44412.213: [Full GC {Heap before GC invocations=1609:
Heap
def new generation total 360448K, used 62330K [0x70800000,
0x872a0000, 0x872a0000)
eden space 349568K, 16% used [0x70800000, 0x73ee0158, 0x85d60000)
from space 10880K, 56% used [0x86800000, 0x86dfe8a8, 0x872a0000)
to space 10880K, 0% used [0x85d60000, 0x85d60000, 0x86800000)
concurrent mark-sweep generation total 1856896K, used 1092590K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 58% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11608K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
44412.215: [DefNew: 62330K->2043K(360448K), 1.6686023 secs]44413.884:
[CMS: 1097663K->1086938K(1856896K), 86.9419291 secs]
1154921K->1086938K(2217344K) Heap after GC invocations=1610:
Heap
def new generation total 360448K, used 0K [0x70800000, 0x872a0000,
0x872a0000)
eden space 349568K, 0% used [0x70800000, 0x70800000, 0x85d60000)
from space 10880K, 0% used [0x85d60000, 0x85d60000, 0x86800000)
to space 10880K, 0% used [0x86800000, 0x86800000, 0x872a0000)
concurrent mark-sweep generation total 1856896K, used 1086938K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 58% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11602K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
} , 88.6141451 secs]
The whole process stops responding at the point where CMS: appears in
the log above. Once it returns there's 1054Mb of heap free (2165
Allocated).
Over the next few seconds, available heap drops to 975,864,834,790
then this GC happens....
44516.172: [GC {Heap before GC invocations=1610:
Heap
def new generation total 360448K, used 349568K [0x70800000,
0x872a0000, 0x872a0000)
eden space 349568K, 100% used [0x70800000, 0x85d60000, 0x85d60000)
from space 10880K, 0% used [0x85d60000, 0x85d60000, 0x86800000)
to space 10880K, 0% used [0x86800000, 0x86800000, 0x872a0000)
concurrent mark-sweep generation total 1856896K, used 1086938K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 58% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11607K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
44516.174: [DefNew: 349568K->9032K(360448K), 0.8516984 secs]
1436506K->1095971K(2217344K) Heap after GC invocations=1611:
Heap
def new generation total 360448K, used 9032K [0x70800000,
0x872a0000, 0x872a0000)
eden space 349568K, 0% used [0x70800000, 0x70800000, 0x85d60000)
from space 10880K, 83% used [0x86800000, 0x870d21e0, 0x872a0000)
to space 10880K, 0% used [0x85d60000, 0x85d60000, 0x86800000)
concurrent mark-sweep generation total 1856896K, used 1086938K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 58% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11607K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
} , 0.8547413 secs]
The Above GC doesn't stop the process, and available heap jumps back
to 1092Mb.
Over the next few seconds, available heap drops to
966,916,873,830,799, then this GC happens........
44537.642: [GC {Heap before GC invocations=1611:
Heap
def new generation total 360448K, used 358600K [0x70800000,
0x872a0000, 0x872a0000)
eden space 349568K, 100% used [0x70800000, 0x85d60000, 0x85d60000)
from space 10880K, 83% used [0x86800000, 0x870d21e0, 0x872a0000)
to space 10880K, 0% used [0x85d60000, 0x85d60000, 0x86800000)
concurrent mark-sweep generation total 1856896K, used 1086938K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 58% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11607K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
44537.643: [DefNew: 358600K->10880K(360448K), 1.7353075 secs]
1445539K->1106095K(2217344K) Heap after GC invocations=1612:
Heap
def new generation total 360448K, used 10880K [0x70800000,
0x872a0000, 0x872a0000)
eden space 349568K, 0% used [0x70800000, 0x70800000, 0x85d60000)
from space 10880K, 100% used [0x85d60000, 0x86800000, 0x86800000)
to space 10880K, 0% used [0x86800000, 0x86800000, 0x872a0000)
concurrent mark-sweep generation total 1856896K, used 1095215K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 58% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11607K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
} , 1.7380387 secs]
Again, the above GC doesn't prevent the server responding, and
available heap jumps up to 1081Mb.
The available heap drops to 786 and the following GC happens....
44551.875: [GC {Heap before GC invocations=1612:
Heap
def new generation total 360448K, used 360448K [0x70800000,
0x872a0000, 0x872a0000)
eden space 349568K, 100% used [0x70800000, 0x85d60000, 0x85d60000)
from space 10880K, 100% used [0x85d60000, 0x86800000, 0x86800000)
to space 10880K, 0% used [0x86800000, 0x86800000, 0x872a0000)
concurrent mark-sweep generation total 1856896K, used 1095215K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 58% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11607K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
44551.877: [DefNew: 360448K->9909K(360448K), 1.4227553 secs]
1455663K->1112567K(2217344K) Heap after GC invocations=1613:
Heap
def new generation total 360448K, used 9909K [0x70800000,
0x872a0000, 0x872a0000)
eden space 349568K, 0% used [0x70800000, 0x70800000, 0x85d60000)
from space 10880K, 91% used [0x86800000, 0x871ad738, 0x872a0000)
to space 10880K, 0% used [0x85d60000, 0x85d60000, 0x86800000)
concurrent mark-sweep generation total 1856896K, used 1102657K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 59% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11607K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
} , 1.4312674 secs]
The above GC doesn't prevent the server responding, and available heap
jumps to 1075Mb, then the following GC stops the world
again for 80 odd seconds and seemingly only brings available heap back
up to about 1124Mb free, which doesn't appear to be much
better than the non-stop the world GCs but has huge expense.
44560.866: [Full GC {Heap before GC invocations=1613:
Heap
def new generation total 360448K, used 216623K [0x70800000,
0x872a0000, 0x872a0000)
eden space 349568K, 59% used [0x70800000, 0x7d1de5e8, 0x85d60000)
from space 10880K, 91% used [0x86800000, 0x871ad738, 0x872a0000)
to space 10880K, 0% used [0x85d60000, 0x85d60000, 0x86800000)
concurrent mark-sweep generation total 1856896K, used 1102657K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 59% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11607K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
44560.867: [DefNew: 216623K->3646K(360448K), 0.7875773 secs]44561.655:
[CMS: 1110868K->1064062K(1856896K), 81.7629623 secs]
1319280K->1064062K(2217344K) Heap after GC invocations=1614:
Heap
def new generation total 360448K, used 0K [0x70800000, 0x872a0000,
0x872a0000)
eden space 349568K, 0% used [0x70800000, 0x70800000, 0x85d60000)
from space 10880K, 0% used [0x85d60000, 0x85d60000, 0x86800000)
to space 10880K, 0% used [0x86800000, 0x86800000, 0x872a0000)
concurrent mark-sweep generation total 1856896K, used 1064062K
[0x872a0000, 0xf8800000, 0xf8800000)
CompactibleFreeListSpace space 1856896K, 57% used [0x872a0000,
0xf8800000)
concurrent-mark-sweep perm gen total 24576K, used 11601K [0xf8800000,
0xfa000000, 0xfa000000)
CompactibleFreeListSpace space 24576K, 47% used [0xf8800000,
0xfa000000)
} , 82.5531528 secs]