2014-04-06 26 views
2

我試圖導入python hadoop流scikit圖像,我已經嘗試了現有的帖子stackoverflow herehere,但他們都沒有解決我的問題。Python的Hadoop流與導入包沒有安裝在數據節點

真正的問題是,即使我使用-file將ZIP/MOD文件與打包的scikit-image文件夾一起分發,在數據節點上運行的Python腳本如何知道如何提取這些軟件包並導入到碼? 請注意,我在我的名稱節點上安裝了python scikit-image,並且能夠運行本地實驗。

我的腳本很簡單:python流的經典單詞計數示例,在mapper.py中帶有額外的「import skimage」。 http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python


我的命令:

hadoop jar hadoop-streaming.jar \ 
-file mapper.py -mapper mapper.py \ 
-file reducer.py -reducer reducer.py \ 
-file ./skimage.mod \ 
-input /user/text/* \ 
-output /user/textoutput/ 

屏幕打印輸出:

packageJobJar: [mapper.py, reducer.py, ./skimage.zip] [/usr/lib/gphd/hadoop-mapreduce-2.0.2_alpha_gphd_2_0_1_0/hadoop-streaming-2.0.2-alpha-gphd-2.0.1.0.jar] /tmp/streamjob6159562120374599467.jar tmpDir=null 
14/04/04 18:00:02 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 
14/04/04 18:00:02 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 
14/04/04 18:00:03 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 
14/04/04 18:00:03 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 
14/04/04 18:00:03 WARN snappy.LoadSnappy: Snappy native library not loaded 
14/04/04 18:00:03 INFO mapred.FileInputFormat: Total input paths to process : 1 
14/04/04 18:00:03 INFO mapreduce.JobSubmitter: number of splits:2 
14/04/04 18:00:03 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar 
14/04/04 18:00:03 WARN conf.Configuration: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files 
14/04/04 18:00:03 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 
14/04/04 18:00:03 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 
14/04/04 18:00:03 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name 
14/04/04 18:00:03 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 
14/04/04 18:00:03 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 
14/04/04 18:00:03 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 
14/04/04 18:00:03 WARN conf.Configuration: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps 
14/04/04 18:00:03 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 
14/04/04 18:00:03 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 
14/04/04 18:00:03 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 
14/04/04 18:00:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1384839777050_0106 
14/04/04 18:00:04 INFO client.YarnClientImpl: Submitted application application_1384839777050_0106 to ResourceManager at hdm3.gphd.local/172.28.9.252:8032 
14/04/04 18:00:04 INFO mapreduce.Job: The url to track the job: http://hdm3.gphd.local:8088/proxy/application_1384839777050_0106/ 
14/04/04 18:00:04 INFO mapreduce.Job: Running job: job_1384839777050_0106 
14/04/04 18:00:08 INFO mapreduce.Job: Job job_1384839777050_0106 running in uber mode : false 
14/04/04 18:00:08 INFO mapreduce.Job: map 0% reduce 0% 
14/04/04 18:00:12 INFO mapreduce.Job: Task Id : attempt_1384839777050_0106_m_000001_0, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) 

我檢查錯誤日誌中的Hadoop的工作,它的抱怨無法找到「進口滑雪法師「,這意味着它沒有被數據節點拾起。

回答