JVM崩潰Hadoop的減速

我運行在Hadoop上的Java代碼，但遇到這樣的錯誤：JVM崩潰Hadoop的減速

# 
# A fatal error has been detected by the Java Runtime Environment: 
# 
# SIGSEGV (0xb) at pc=0x00007f2ffe7e1904, pid=31718, tid=139843231057664 
# 
# JRE version: Java(TM) SE Runtime Environment (8.0_72-b15) (build 1.8.0_72-b15) 
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.72-b15 mixed mode linux-amd64 compressed oops) 
# Problematic frame: 
# V [libjvm.so+0x813904] PhaseIdealLoop::build_loop_late_post(Node*)+0x144 
# 
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again 
# 
# An error report file with more information is saved as: 
# /hadoop/nm-local-dir/usercache/ihradmin/appcache/application_1479451766852_3736/container_1479451766852_3736_01_000144/hs_err_pid31718.log 
# 
# Compiler replay data is saved as: 
# /hadoop/nm-local-dir/usercache/ihradmin/appcache/application_1479451766852_3736/container_1479451766852_3736_01_000144/replay_pid31718.log 
# 
# If you would like to submit a bug report, please visit: 
# http://bugreport.java.com/bugreport/crash.jsp

當我去到節點管理器，所有的日誌，因爲yarn.log-aggregation-enable is true聚集，並記錄hs_err_pid31718.log和無法找到replay_pid31718.log。

通常情況下1）JVM在幾分鐘的減速器後崩潰，2）有時減速器的自動重試可以成功，3）有些減速器可以成功而不失敗。

Hadoop版本是2.6.0，Java是Java8。這不是一個新的環境，我們有很多作業在集羣上運行。

我的問題：

我能找到hs_err_pid31718.log紗線合計後的任意位置的日誌，並刪除該文件夾？或者是否有一個設置來保存所有本地日誌，以便我可以檢查hs_err_pid31718.log，同時通過紗線彙總日誌？
什麼是縮小深潛範圍的常見步驟？由於jvm崩潰，我無法在代碼中看到任何異常。我已經嘗試過-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp這個參數，但是沒有堆丟棄在主機上，沒有減少任務。

感謝您的任何建議。

來源

2017-07-11 Lipeng Yang

答案

使用-XX:ErrorFile=<your prefered location>/hs_err_pid<pid>.log到hs_error文件位置設置爲您的首選之一。
崩潰是由於JDK錯誤JDK-6675699，這已經在JDK9中修復，並且在JDK8更新74之後提供了backports。

您正在使用JDK8更新72 請從here升級到最新版本，以避免此崩潰。

來源

2017-07-11 16:42:54 Fairoz

thx，我會試試看，並在這裏更新。 –

這很有效，當我們升級我們的hadoop環境的JDK時，JVM崩潰得到了解決，儘管我仍然想知道爲什麼每次都沒有發生這種崩潰，因爲我們使用相同的業務代碼和輸入。 –

崩潰是由於編譯器試圖構建理想的圖形，編譯器會在運行時進行優化和內聯，所以它不總是可重現的。我希望這將是明確的。 – Fairoz

JVM崩潰Hadoop的減速

回答

相關問題