2011-07-21 53 views
2

我在這裏遵循這個例子,希望能夠成功運行使用EC2/S3/EMR/R的東西。 https://gist.github.com/406824流式命令失敗!使用Elastic Map Reduce時發生錯誤/ S3和R

作業在流式步驟上失敗。 下面是錯誤日誌:

控制器:

2011-07-21T19:14:27.711Z INFO Fetching jar file. 
2011-07-21T19:14:30.380Z INFO Working dir /mnt/var/lib/hadoop/steps/1 
2011-07-21T19:14:30.380Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java -cp /home/hadoop/conf: /usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-0.20-core.jar:/home/hadoop/hadoop-0.20-tools.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/* -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/1 -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/1/tmp -Djava.library.path=/home/hadoop/lib/native/Linux-i386-32 org.apache.hadoop.util.RunJar /home/hadoop/contrib/streaming/hadoop-streaming.jar -cacheFile s3n://emrexample21/calculatePiFunction.R#calculatePiFunction.R -input s3n://emrexample21/numberList.txt -output s3n://emrout/ -mapper s3n://emrexample21/mapper.R -reducer s3n://emrexample21/reducer.R 
2011-07-21T19:16:12.057Z INFO Execution ended with ret val 1 
2011-07-21T19:16:12.057Z WARN Step failed with bad retval 
2011-07-21T19:16:14.185Z INFO Step created jobs: job_201107211913_0001 

標準錯誤:

Streaming Command Failed! 

標準輸出:

packageJobJar: [/mnt/var/lib/hadoop/tmp/hadoop-unjar2368654264051498521/] [] /mnt/var/lib/hadoop/steps/2/tmp/streamjob1658200878131882888.jar tmpDir=null 

系統日誌:

2011-07-21 19:50:29,539 INFO org.apache.hadoop.mapred.JobClient (main): Default number of map tasks: 2 
2011-07-21 19:50:29,539 INFO org.apache.hadoop.mapred.JobClient (main): Default number of reduce tasks: 15 
2011-07-21 19:50:31,988 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader (main): Loaded native gpl library 
2011-07-21 19:50:31,999 INFO com.hadoop.compression.lzo.LzoCodec (main): Successfully loaded & initialized native-lzo library [hadoop-lzo rev 2334756312e0012cac793f12f4151bdaa1b4b1bb] 
2011-07-21 19:50:33,040 INFO org.apache.hadoop.mapred.FileInputFormat (main): Total input paths to process : 1 
2011-07-21 19:50:35,375 INFO org.apache.hadoop.streaming.StreamJob (main): getLocalDirs(): [/mnt/var/lib/hadoop/mapred] 
2011-07-21 19:50:35,375 INFO org.apache.hadoop.streaming.StreamJob (main): Running job: job_201107211948_0001 
2011-07-21 19:50:35,375 INFO org.apache.hadoop.streaming.StreamJob (main): To kill this job, run: 
2011-07-21 19:50:35,375 INFO org.apache.hadoop.streaming.StreamJob (main): UNDEF/bin/hadoop job -Dmapred.job.tracker=ip-10-203-50-161.ec2.internal:9001 -kill job_201107211948_0001 
2011-07-21 19:50:35,376 INFO org.apache.hadoop.streaming.StreamJob (main): Tracking URL: http://ip-10-203-50-161.ec2.internal:9100/jobdetails.jsp?jobid=job_201107211948_0001 
2011-07-21 19:50:36,566 INFO org.apache.hadoop.streaming.StreamJob (main): map 0% reduce 0% 
2011-07-21 19:50:57,778 INFO org.apache.hadoop.streaming.StreamJob (main): map 50% reduce 0% 
2011-07-21 19:51:09,839 INFO org.apache.hadoop.streaming.StreamJob (main): map 100% reduce 0% 
2011-07-21 19:51:12,852 INFO org.apache.hadoop.streaming.StreamJob (main): map 100% reduce 1% 
2011-07-21 19:51:15,864 INFO org.apache.hadoop.streaming.StreamJob (main): map 100% reduce 3% 
2011-07-21 19:51:18,875 INFO org.apache.hadoop.streaming.StreamJob (main): map 100% reduce 0% 
2011-07-21 19:52:12,454 INFO org.apache.hadoop.streaming.StreamJob (main): map 100% reduce 100% 
2011-07-21 19:52:12,455 INFO org.apache.hadoop.streaming.StreamJob (main): To kill this job, run: 
2011-07-21 19:52:12,455 INFO org.apache.hadoop.streaming.StreamJob (main): UNDEF/bin/hadoop job -Dmapred.job.tracker=ip-10-203-50-161.ec2.internal:9001 -kill job_201107211948_0001 
2011-07-21 19:52:12,456 INFO org.apache.hadoop.streaming.StreamJob (main): Tracking URL: http://ip-10-203-50-161.ec2.internal:9100/jobdetails.jsp?jobid=job_201107211948_0001 
2011-07-21 19:52:12,456 ERROR org.apache.hadoop.streaming.StreamJob (main): Job not Successful! 
2011-07-21 19:52:12,456 INFO org.apache.hadoop.streaming.StreamJob (main): killJob... 
+2

嘿,這是你鏈接的一些看起來很漂亮的代碼。沒有辦法,有錯誤! :) –

回答

6

我是您試圖運行的代碼的作者。它被寫爲R和EMR概念的證明。使用該方法制作真正有用的代碼非常困難。使用該方法正常工作所需的所有手動步驟將R代碼提交給EMR是一個繁瑣痛苦的練習。

爲了解決這個問題,我後來編寫了Segue package,它將所有的位加載到S3中,並在Hadoop節點上更新R版本。 Jeffry Breen寫了關於使用Segue的blog post。看看這個,看看它是否更容易使用。

編輯:

我至少應該給一些提示在EMR/Hadoop的流調試R代碼裏面:

1)從Hadoop日誌調試R代碼裏面是該死的幾乎是不可能的。根據我的經驗,我真的必須建立一個EMR羣集,登錄並從R中手動運行代碼。這需要使用定義的密鑰啓動羣集。我通常在單個節點集羣上進行調試並使用非常小的數據集。沒有意義運行多個節點只是爲了調試。

2)在EMR節點上的R中交互式地運行作業需要在Hadoop節點上的/ home/hadoop /目錄中有任何輸入文件。要做到這一點的最簡單方法就是將您需要的所有文件scp到集羣。

3)在此之前做1 & 2,測試代碼在本地使用相同的方法

4)一旦你認爲將R代碼工作,你應該能夠做到這一點您的Hadoop機器上

cat numberList.txt | ./mapper.R | sort | ./reducer.R 

它應該運行。如果您不使用映射器或縮減器,則可以用貓替換它們。我在這個例子中使用了numberList.txt,因爲在我的github上的代碼是輸入文件名。

+0

非常感謝!我要去看看Segue和博客文章。調試信息肯定也會有幫助。 – tcash21

相關問題