2011-10-31 20 views
0

我有一個圖減少工作其運行良好,直到我開始看到一些失敗的地圖任務,如hadoop-streaming:reducer處於掛起狀態,不啓動?

attempt_201110302152_0003_m_000010_0 task_201110302152_0003_m_000010 worker1 FAILED 
Task attempt_201110302152_0003_m_000010_0 failed to report status for 602 seconds. Killing! 
------- 
Task attempt_201110302152_0003_m_000010_0 failed to report status for 607 seconds. Killing! 
Last 4KB 
Last 8KB 
All 
attempt_201110302152_0003_m_000010_1 task_201110302152_0003_m_000010 master FAILED 
java.lang.RuntimeException: java.io.IOException: Spill failed 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) 
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:261) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) 
    at org.apache.hadoop.mapred.Child.main(Child.java:255) 
Caused by: java.io.IOException: Spill failed 
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1029) 
    at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592) 
    at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:381) 
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill11.out 
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) 
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) 
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127) 
    at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121) 
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1392) 
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853) 
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344) 
Last 4KB 
Last 8KB 
All 

和現在,而更早的減速器用於啓動複製減速沒有開始執行數據即使在地圖任務運行時,我看到的是這個

11/10/31 03:35:12 INFO streaming.StreamJob: map 95% reduce 0% 
11/10/31 03:44:01 INFO streaming.StreamJob: map 96% reduce 0% 
11/10/31 03:51:56 INFO streaming.StreamJob: map 97% reduce 0% 
11/10/31 03:55:41 INFO streaming.StreamJob: map 98% reduce 0% 
11/10/31 04:04:18 INFO streaming.StreamJob: map 99% reduce 0% 
11/10/31 04:20:32 INFO streaming.StreamJob: map 100% reduce 0% 

我新手hadoopmapreduce並沒有真正知道什麼可能導致相同的代碼失敗whic h的早些時候成功運行

請幫

謝謝

回答

1

你應該看看mapred.task.timeout。如果你有大量的數據和很少的機器來處理它,你的任務可能會超時。您可以將此值設置爲0,該值禁用此超時。

或者,如果你可以調用context.progress或一些同等功能的說有事情發生,使工作不超時。

+0

好像有2個錯誤 - 一個是超時,另一種是「產生的原因:產生java.io.IOException:溢出失敗' –

0

我有這個同樣的問題,有兩件事情我沒有解決它:

首先是壓縮你的映射器的輸出,使用mapred.output.compress=true。當你的映射器運行時,輸出溢出到磁盤上(寫入到磁盤),有時是需要將輸出發送到另一臺機器上減速。壓縮輸出將減少網絡流量,並減少運行映射器的計算機上所需的磁盤數量。

我做的第二件事是增加對HDFS的ulimits和mapred用戶。我添加了這些行/etc/security/limits.conf

mapred  soft nproc  16384 
mapred  soft nofile  16384 
hdfs  soft nproc  16384 
hdfs  soft nofile  16384 
hbase  soft nproc  16384 
hbase  soft nofile  16384 

這篇文章有一個更透徹的解釋:http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/