2016-01-03 29 views
3

我設置ParallelGCThreads=1和使用G1GC,所有其他JVM設置是默認設置。我在Spark-1.5.1上運行PageRank,每個EC2節點有100 GB堆。G1 GC單個,非常長的年輕GC發生與ParallelGCThreads = 1

我的堆使用圖如下(紅色區域:年輕一代,黑色區域:舊一代)。所有年輕的GC都很小,突然之間有一個年輕的GC收集了60 GB,然後年輕的GC再次變小。我的GC日誌顯示沒有混合的GC,沒有完整的GC,一個併發標記和數十個年輕GC。我想知道爲什麼這個巨大的年輕GC發生?

enter image description here

下面我GC日誌的一部分。巨大的年輕GC是一個與 「堆:84.1G」

2015-12-30T06:59:02.488+0000: 245.088: [GC pause (young) 245.089: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 1727, predicted base time: 24.64 ms, remaining time: 175.36 ms, target pause time: 200.00 ms] 
245.089: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 206 regions, survivors: 3 regions, predicted young region time: 148.87 ms] 
245.089: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 206 regions, survivors: 3 regions, old: 0 regions, predicted pause time: 173.51 ms, target pause time: 200.00 ms] 
2015-12-30T06:59:02.531+0000: 245.131: [SoftReference, 0 refs, 0.0000520 secs]2015-12-30T06:59:02.531+0000: 245.131: [WeakReference, 21 refs, 0.0000160 secs]2015-12-30T06:59:02.531+0000: 245.131: [FinalReference, 9759 refs, 0.0084720 secs]2015-12-30T06:59:02.539+0000: 245.140: [PhantomReference, 0 refs, 14 refs, 0.0000190 secs]2015-12-30T06:59:02.539+0000: 245.140: [JNI Weak Reference, 0.0000130 secs] 245.142: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 12.51 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)] 
, 0.0534140 secs] 
    [Parallel Time: 42.3 ms, GC Workers: 1] 
     [GC Worker Start (ms): 245088.6] 
     [Ext Root Scanning (ms): 14.4] 
     [Update RS (ms): 1.9] 
     [Processed Buffers: 34] 
     [Scan RS (ms): 0.4] 
     [Code Root Scanning (ms): 0.0] 
     [Object Copy (ms): 25.5] 
     [Termination (ms): 0.0] 
     [GC Worker Other (ms): 0.0] 
     [GC Worker Total (ms): 42.3] 
     [GC Worker End (ms): 245130.9] 
    [Code Root Fixup: 0.0 ms] 
    [Code Root Migration: 0.0 ms] 
    [Clear CT: 1.6 ms] 
    [Other: 9.5 ms] 
     [Choose CSet: 0.0 ms] 
     [Ref Proc: 8.6 ms] 
     [Ref Enq: 0.2 ms] 
     [Free CSet: 0.4 ms] 
    [Eden: 6592.0M(6592.0M)->0.0B(58.8G) Survivors: 96.0M->128.0M Heap: 30.6G(100.0G)->24.2G(100.0G)] 
[Times: user=0.05 sys=0.00, real=0.06 secs] 
2015-12-30T06:59:43.451+0000: 286.051: [GC pause (young) 286.054: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 392599, predicted base time: 367.03 ms, remaining time: 0.00 ms, target pause time: 200.00 ms] 
286.054: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1884 regions, survivors: 4 regions, predicted young region time: 150.18 ms] 
286.054: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1884 regions, survivors: 4 regions, old: 0 regions, predicted pause time: 517.21 ms, target pause time: 200.00 ms] 
2015-12-30T06:59:47.767+0000: 290.368: [SoftReference, 0 refs, 0.0000570 secs]2015-12-30T06:59:47.768+0000: 290.368: [WeakReference, 350 refs, 0.0000640 secs]2015-12-30T06:59:47.768+0000: 290.368: [FinalReference, 99336 refs, 0.3781120 secs]2015-12-30T06:59:48.146+0000: 290.746: [PhantomReference, 0 refs, 1 refs, 0.0000290 secs]2015-12-30T06:59:48.146+0000: 290.746: [JNI Weak Reference, 0.0000140 secs] 290.767: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 11.74 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)] 
, 4.7153740 secs] 
    [Parallel Time: 4313.9 ms, GC Workers: 1] 
     [GC Worker Start (ms): 286053.9] 
     [Ext Root Scanning (ms): 15.2] 
     [Update RS (ms): 86.3] 
     [Processed Buffers: 1557] 
     [Scan RS (ms): 4.1] 
     [Code Root Scanning (ms): 0.2] 
     [Object Copy (ms): 4208.1] 
     [Termination (ms): 0.0] 
     [GC Worker Other (ms): 0.0] 
     [GC Worker Total (ms): 4313.9] 
     [GC Worker End (ms): 290367.8] 
    [Code Root Fixup: 0.0 ms] 
    [Code Root Migration: 0.3 ms] 
    [Clear CT: 15.1 ms] 
    [Other: 386.0 ms] 
     [Choose CSet: 0.0 ms] 
     [Ref Proc: 378.4 ms] 
     [Ref Enq: 1.7 ms] 
     [Free CSet: 3.3 ms] 
    [Eden: 58.9G(58.8G)->0.0B(3456.0M) Survivors: 128.0M->1664.0M Heap: 84.1G(100.0G)->26.7G(100.0G)] 
[Times: user=0.78 sys=3.94, real=4.71 secs] 
+0

爲什麼你只有一個GC工作人員?你有多少個CPU? –

+0

你可以發佈你的GC選項嗎? – the8472

+0

@PeterLawrey謝謝!我只有一名GC工作人員,因爲我正在研究GC行爲。我有16個CPU,每個8個核心 – user2616566

回答

5

嘗試堆擴展,原因是:最近GC開銷比GC,GC最近開銷後閾值:11.74%,門檻:10.00%

我的猜測是,這是推動G1的決定。您可以通過設置-XX:GCTimeRatio=4來放鬆它,這將使其佔用相對於GCing應用程序時間的20%的CPU週期,而不是10%。

如果這是太多了,你要麼

  • 允許它使用多個CPU內核 - 這將使它更容易滿足其暫停時間目標,而這又意味着它可以推遲更長的集合,使其更容易以達到吞吐量目標。
    是的,這意味着使用更多內核實際上可以使用更少的CPU週期。
  • 放鬆暫停時間目標,以減少收集時間
+0

謝謝!我正在嘗試你所建議的選項。當我研究G1行爲時,你能明白爲什麼G1突然決定要花費伊甸園麼?所有其他所有的集合都很小,而且,你知道G1遵循什麼等式來決定GC的大小嗎? – user2616566

+0

我只有一個粗略的想法,它是如何工作的,並且必須自己google詳細說明AdaptiveSizePolicy-和(明顯的) 「PrintFlagsFinal」中與G1相關的標誌是一個很好的起點,它們中的一些確實會影響到進行大小調整的方式,但你可以清楚地看到它的吞吐量目標失敗,並且越來越接近於停止暫停時間目標,所以它顯而易見G1有difficulti與它給出的約束一起。 – the8472

+0

更正:暫停時間目標的預測接近於目標失敗,實際暫停時間似乎更小。所以預測因爲某種原因不符合標準。 – the8472