我正在嘗試使用Python進行Hadoop流式傳輸。我已經寫了簡單的地圖,並通過here失敗的地圖任務數量超出允許的限制
map
腳本以幫助減少腳本如下:
#!/usr/bin/env python
import sys, urllib, re
title_re = re.compile("<title>(.*?)</title>", re.MULTILINE | re.DOTALL | re.IGNORECASE)
for line in sys.stdin:
url = line.strip()
match = title_re.search(urllib.urlopen(url).read())
if match :
print url, "\t", match.group(1).strip()
和reduce
腳本如下:
#!/usr/bin/env python
from operator import itemgetter
import sys
for line in sys.stdin :
line = line.strip()
print line
使用Hadoop運行這些腳本後流動罐,map
任務完成,我可以看到他們100%完成,但reduce
工作卡住了22%,經過很長一段時間後,它給了ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
錯誤。
我無法找出背後的確切原因。
我的終端窗口看起來如下:
[email protected]:/host/Shekhar/Softwares/hadoop-1.0.0$ hadoop jar contrib/streaming/hadoop-streaming-1.0.0.jar -mapper /host/Shekhar/HadoopWorld/MultiFetch.py -reducer /host/Shekhar/HadoopWorld/reducer.py -input /host/Shekhar/HadoopWorld/urls/* -output /host/Shekhar/HadoopWorld/titles3
Warning: $HADOOP_HOME is deprecated.
packageJobJar: [/tmp/hadoop-shekhar/hadoop-unjar2709939812732871143/] [] /tmp/streamjob1176812134999992997.jar tmpDir=null
12/05/27 11:27:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/05/27 11:27:46 INFO mapred.FileInputFormat: Total input paths to process : 3
12/05/27 11:27:46 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-shekhar/mapred/local]
12/05/27 11:27:46 INFO streaming.StreamJob: Running job: job_201205271050_0006
12/05/27 11:27:46 INFO streaming.StreamJob: To kill this job, run:
12/05/27 11:27:46 INFO streaming.StreamJob: /host/Shekhar/Softwares/hadoop-1.0.0/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201205271050_0006
12/05/27 11:27:46 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201205271050_0006
12/05/27 11:27:47 INFO streaming.StreamJob: map 0% reduce 0%
12/05/27 11:28:07 INFO streaming.StreamJob: map 67% reduce 0%
12/05/27 11:28:37 INFO streaming.StreamJob: map 100% reduce 0%
12/05/27 11:28:40 INFO streaming.StreamJob: map 100% reduce 11%
12/05/27 11:28:49 INFO streaming.StreamJob: map 100% reduce 22%
12/05/27 11:31:35 INFO streaming.StreamJob: map 67% reduce 22%
12/05/27 11:31:44 INFO streaming.StreamJob: map 100% reduce 22%
12/05/27 11:34:52 INFO streaming.StreamJob: map 67% reduce 22%
12/05/27 11:35:01 INFO streaming.StreamJob: map 100% reduce 22%
12/05/27 11:38:11 INFO streaming.StreamJob: map 67% reduce 22%
12/05/27 11:38:20 INFO streaming.StreamJob: map 100% reduce 22%
12/05/27 11:41:29 INFO streaming.StreamJob: map 67% reduce 22%
12/05/27 11:41:35 INFO streaming.StreamJob: map 100% reduce 100%
12/05/27 11:41:35 INFO streaming.StreamJob: To kill this job, run:
12/05/27 11:41:35 INFO streaming.StreamJob: /host/Shekhar/Softwares/hadoop-1.0.0/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201205271050_0006
12/05/27 11:41:35 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201205271050_0006
12/05/27 11:41:35 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201205271050_0006_m_000001
12/05/27 11:41:35 INFO streaming.StreamJob: killJob...
Streaming Job Failed!
誰能幫我?
編輯 作業服務器詳情如下:
Hadoop job_201205271050_0006 on localhost
User: shekhar
Job Name: streamjob1176812134999992997.jar
Job File: file:/tmp/hadoop-shekhar/mapred/staging/shekhar/.staging/job_201205271050_0006/job.xml
Submit Host: ubuntu
Submit Host Address: 127.0.1.1
Job-ACLs: All users are allowed
Job Setup: Successful
Status: Failed
Failure Info:# of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201205271050_0006_m_000001
Started at: Sun May 27 11:27:46 IST 2012
Failed at: Sun May 27 11:41:35 IST 2012
Failed in: 13mins, 48sec
Job Cleanup: Successful
Black-listed TaskTrackers: 1
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed
Task Attempts
map 100.00%
3 0 0 2 1 4/0
reduce 100.00%
1 0 0 0 1 0/1
轉到跟蹤URL爲http://本地主機:50030/jobdetails.jsp ?jobid = job_201205271050_0006找出實際的錯誤 –
@ Raze2dust,我打開該網址,但也有同樣的錯誤... – Shekhar
你檢查了失敗的單個任務的stdout/stderr日誌嗎? –