2013-07-10 115 views
2

所以,我們使用G1 GC。和18GB堆。年輕一代的規模約爲3.5G。堆最大使用量約爲12G。記憶充滿了短暫的生物。G1 GC處理參考太慢

此外,也許重要的是Couchbase實例正在JVM附近的同一節點上運行。它在硬盤上保留更改時經常採用所有iops,但是有足夠的空閒cpu時間和內存。

啓用JVM選項:

-Xmx18g -Xms18g -XX:MaxPermSize=512M -XX:+UseG1GC -XX:+DisableExplicitGC -XX:+ParallelRefProcEnabled 

我傷心的問題是年輕一代的GC暫停。長時間停頓GC日誌通常看起來像:

2013-07-10T15:06:25.963+0400: 9122,066: [GC pause (young) 
Desired survivor size 243269632 bytes, new threshold 5 (max 15) 
     - age 1: 69789280 bytes, 69789280 total 
- age 2: 58618240 bytes, 128407520 total 
- age 3: 54519720 bytes, 182927240 total 
- age 4: 51592728 bytes, 234519968 total 
- age 5: 45186520 bytes, 279706488 total 
9122,066: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 174,16 ms, remaining time: 25,84 ms, target pause time: 200,00 ms] 
     9122,066: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 426 regions, survivors: 34 regions, predicted young region time: 164,97 ms] 
     9122,066: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 426 regions, survivors: 34 regions, old: 0 regions, predicted pause time: 339,13 ms, target pause time: 200,00 ms] 
     9122,259: [SoftReference, 0 refs, 0,0063780 secs]9124,575: [WeakReference, 4 refs, 0,0031600 secs]9124,578: [FinalReference, 1640 refs, 0,0033730 secs]9124,581: [PhantomReference, 145 refs, 0,0032080 secs]9124,5 
     85: [JNI Weak Reference, 0,0000810 secs], 2,53669600 secs] 
     [Parallel Time: 190,5 ms] 
     [GC Worker Start (ms): 9122066,6 9122066,7 9122066,7 9122066,7 9122066,8 9122066,9 9122066,9 9122066,9 9122066,9 9122067,0 9122067,0 9122067,1 9122067,1 9122067,1 9122067,1 9122067,2 91220 
     67,2 9122067,3 
Avg: 9122067,0, Min: 9122066,6, Max: 9122067,3, Diff: 0,7] 
     [Ext Root Scanning (ms): 4,7 6,0 4,8 4,5 4,2 4,3 4,2 4,3 4,6 3,4 13,5 5,2 4,2 5,6 4,2 4,1 4,3 4,0 
Avg: 5,0, Min: 3,4, Max: 13,5, Diff: 10,1] 
     [Update RS (ms): 20,9 19,6 21,1 21,3 21,2 21,2 21,3 21,2 21,7 21,5 12,1 20,2 21,1 19,4 21,0 21,1 20,7 21,2 
Avg: 20,4, Min: 12,1, Max: 21,7, Diff: 9,6] 
     [Processed Buffers : 27 23 25 29 31 22 25 34 28 14 36 23 24 22 28 24 25 24 
Sum: 464, Avg: 25, Min: 14, Max: 36, Diff: 22] 
     [Scan RS (ms): 9,0 9,2 8,7 8,8 9,1 9,1 8,9 9,1 8,3 9,2 9,0 9,1 9,2 9,2 9,1 9,0 9,0 9,1 
Avg: 9,0, Min: 8,3, Max: 9,2, Diff: 1,0] 
     [Object Copy (ms): 145,1 145,0 145,2 145,1 145,1 144,9 145,1 144,9 144,9 145,4 144,8 144,8 144,8 145,0 145,0 145,1 145,2 144,9 
Avg: 145,0, Min: 144,8, Max: 145,4, Diff: 0,6] 
     [Termination (ms): 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 
Avg: 0,0, Min: 0,0, Max: 0,0, Diff: 0,0] 
     [Termination Attempts : 5 8 2 11 5 6 6 5 5 7 4 7 2 9 8 5 7 8 
Sum: 110, Avg: 6, Min: 2, Max: 11, Diff: 9] 
     [GC Worker End (ms): 9122246,4 9122246,6 9122246,7 9122246,6 9122246,7 9122246,7 9122246,5 9122246,7 9122246,5 9122246,5 9122246,6 9122246,7 9122246,8 9122246,4 9122246,6 9122246,5 9122246 
     ,7 9122246,8 
Avg: 9122246,6, Min: 9122246,4, Max: 9122246,8, Diff: 0,3] 
     [GC Worker (ms): 179,8 179,9 180,0 179,8 179,9 179,9 179,6 179,8 179,6 179,5 179,6 179,6 179,7 179,3 179,5 179,4 179,4 179,5 
Avg: 179,7, Min: 179,3, Max: 180,0, Diff: 0,7] 
     [GC Worker Other (ms): 10,7 10,7 10,8 10,8 10,9 10,9 11,0 11,0 11,0 11,1 11,1 11,1 11,2 11,2 11,2 11,2 11,3 11,3 
Avg: 11,0, Min: 10,7, Max: 11,3, Diff: 0,6] 
     [Clear CT: 2,8 ms] 
     [Other: 2343,4 ms] 
     [Choose CSet: 0,1 ms] 
     [Ref Proc: 2327,7 ms] 
     [Ref Enq: 1,9 ms] 
     [Free CSet: 8,2 ms] 
     [Eden: 3408M(3408M)->0B(3400M) Survivors: 272M->280M Heap: 9998M(18432M)->6638M(18432M)] 
     [Times: user=3,26 sys=0,02, real=2,54 secs] 
Total time for which application threads were stopped: 2,5434370 seconds 

GC出現問題的唯一階段是'參考處理'。但日誌看起來很奇怪:軟,弱,final和jni引用處理花費的時間很少。但總體時間是2.5秒。或者甚至更糟糕,在最壞的情況下可以達到10秒。

另一個暫停(更舒適)可能看起來像:

2013-07-10T16:26:11.862+0400: 13907,965: [GC pause (young) 
    Desired survivor size 243269632 bytes, new threshold 4 (max 15) 
    - age 1: 69125832 bytes, 69125832 total 
    - age 2: 58756480 bytes, 127882312 total 
    - age 3: 52397376 bytes, 180279688 total 
    - age 4: 88850424 bytes, 269130112 total 
    13907,965: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 77,38 ms, remaining time: 122,62 ms, target pause time: 200,00 ms] 
    13907,965: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 427 regions, survivors: 33 regions, predicted young region time: 167,95 ms] 
    13907,965: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 427 regions, survivors: 33 regions, old: 0 regions, predicted pause time: 245,33 ms, target pause time: 200,00 ms] 
    13908,155: [SoftReference, 0 refs, 0,0041340 secs]13908,160: [WeakReference, 0 refs, 0,0023850 secs]13908,162: [FinalReference, 1393 refs, 0,0065970 secs]13908,169: [PhantomReference, 108 refs, 0,0018650 secs]13908,171: [JNI Weak Reference, 0,0000630 secs], 0,22008100 secs] 
    [Parallel Time: 188,4 ms] 
    [GC Worker Start (ms): 13907965,3 13907965,3 13907965,4 13907965,4 13907965,5 13907965,5 13907965,6 13907965,6 13907965,6 13907965,7 13907965,7 13907965,7 13907965,8 13907965,8 13907965,8 13907965,9 13907965,9 13907965,9 
    Avg: 13907965,6, Min: 13907965,3, Max: 13907965,9, Diff: 0,6] 
    [Ext Root Scanning (ms): 5,8 5,0 6,8 6,3 6,1 6,2 6,0 6,3 5,2 4,2 5,0 6,2 4,5 6,0 17,1 4,4 6,2 5,3 
    Avg: 6,3, Min: 4,2, Max: 17,1, Diff: 12,9] 
    [Update RS (ms): 24,8 26,0 23,9 24,1 24,1 24,1 24,2 23,9 25,0 25,2 25,1 24,1 26,0 24,3 13,7 25,7 24,2 24,7 
    Avg: 24,1, Min: 13,7, Max: 26,0, Diff: 12,2] 
    [Processed Buffers : 30 20 9 16 16 19 20 21 22 12 30 17 17 20 12 20 17 22 
    Sum: 340, Avg: 18, Min: 9, Max: 30, Diff: 21] 
    [Scan RS (ms): 7,5 7,1 7,2 7,5 7,6 7,5 7,5 7,6 7,1 7,4 7,6 7,2 7,2 7,4 7,2 7,5 7,0 7,7 
    Avg: 7,4, Min: 7,0, Max: 7,7, Diff: 0,7] 
    [Object Copy (ms): 133,1 133,1 133,2 133,1 133,2 133,1 133,2 133,1 133,5 134,0 133,0 133,2 133,0 132,9 132,6 133,1 133,2 132,9 
    Avg: 133,1, Min: 132,6, Max: 134,0, Diff: 1,3] 
    [Termination (ms): 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 
    Avg: 0,0, Min: 0,0, Max: 0,0, Diff: 0,0] 
    [Termination Attempts : 1 3 1 1 1 1 1 1 1 1 2 3 2 1 1 1 1 1 
    Sum: 24, Avg: 1, Min: 1, Max: 3, Diff: 2] 
    [GC Worker End (ms): 13908136,6 13908136,9 13908136,5 13908136,7 13908136,7 13908136,8 13908136,7 13908136,7 13908136,8 13908136,8 13908136,5 13908136,6 13908136,5 13908136,5 13908136,5 13908136,5 13908136,8 13908136,6 
    Avg: 13908136,7, Min: 13908136,5, Max: 13908136,9, Diff: 0,4] 
    [GC Worker (ms): 171,3 171,6 171,1 171,2 171,2 171,3 171,1 171,1 171,1 171,2 170,8 170,9 170,7 170,7 170,7 170,6 171,0 170,7 
    Avg: 171,0, Min: 170,6, Max: 171,6, Diff: 0,9] 
    [GC Worker Other (ms): 17,2 17,2 17,3 17,3 17,4 17,4 17,5 17,5 17,5 17,5 17,6 17,6 17,7 17,7 17,7 17,7 17,8 17,8 
    Avg: 17,5, Min: 17,2, Max: 17,8, Diff: 0,6] 
    [Clear CT: 1,6 ms] 
    [Other: 30,1 ms] 
    [Choose CSet: 0,1 ms] 
    [Ref Proc: 17,1 ms] 
    [Ref Enq: 0,9 ms] 
    [Free CSet: 7,4 ms] 
    [Eden: 3416M(3416M)->0B(3456M) Survivors: 264M->224M Heap: 7289M(18432M)->3912M(18432M)] 
    [Times: user=3,16 sys=0,00, real=0,22 secs]  

參考處理仍然是最長的階段,但它的短得多。 ParallelRefProcEnabled不能解決我的問題。我也試圖改變年輕的gen的大小。它也沒有幫助。設置不同的-XX:MaxGCPauseMillis,更寬鬆600ms或更嚴格100,導致吞吐量仍然不佳。

CMS性能甚至比G1與參數更糟:

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 

楊根集合會更長,更頻繁。

我完全與描述的日誌混淆。明天我會嘗試將Couchbase實例移動到另一個節點,以檢查它是否凍結GC線程。但是,如果Couchbase不是重點,也許有人可以解釋我對日誌的含義。或者,也許有一些神奇的CMS參數來解決這個問題。

我會很高興的任何幫助!

+0

您是否有CMS收集器的GC日誌,並啓用了+ XX:+ PrintReferenceGC選項? –

+0

是的,當然。這部分的日誌是關於它的: 13908,155:[SoftReference,0 refs,0,0041340 secs] 13908,160:[WeakReference,0 refs,0.0023850 secs] 13908,162:[FinalReference,1393 refs, 0,0065970秒] 13908,169:[PhantomReference,108參考,0.0018650秒] 13908,171:[JNI弱參考,0,0000630秒],0,22008100秒] [並行時間:188,4毫秒] –

+1

事實上,我們似乎已經通過將couchbase移動到另一臺機器來解決了我們的問題。現在99%的暫停時間少於0.25秒。但是,我們在每1,1-0秒的GC暫停中仍然有大約2000個最終鏈接。也許我應該做點什麼來讓它變得更好。 –

回答

1

問題是由我們自己修復的。我們有一條新規則 - 切勿在JVM附近安裝Couchbase。 在過去,我們遇到了實例Postgresql與Couchbase衝突的問題,因爲Couchbase喜歡抓取所有磁盤操作並且Postgresql無法提交任何內容。

因此,隔離Couchbase和一切都會好起來的。