0

我試圖通過AWS SDK得到這個命令:增加額外arguements到HadoopJarStepConfig失敗

hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input hdfs:///logs/ -output hdfs:///no_dups -mapper dedup_mapper.py -reducer dedup_reducer.py -file deduplication.py dedup_mapper.py dedup_reducer.py timber.py signature_v4.py 

我的Java代碼:

HadoopJarStepConfig config = new StreamingStep() 
     .withInputs("hdfs:///logs") 
     .withOutput("hdfs:///no_dups") 
     .withMapper("dedup_mapper.py") 
     .withReducer("dedup_reducer.py") 
     .toHadoopJarStepConfig(); 

Collection<String> aggs = config.getArgs(); 
aggs.add("-file deduplication.py timber.py dedup_mapper.py dedup_reducer.py signature_v4.py"); 
config.setArgs(aggs); 

將會產生以下AddJobFlowStepsRequest(的ToString()時被稱爲):

{JobFlowId: j-3TDECOMCOO8HE, Steps: [{Name: DeDup, ActionOnFailure: CONTINUE, HadoopJarStep: {Properties: [], Jar: /home/hadoop/contrib/streaming/hadoop-streaming.jar, Args: [-input, hdfs:///logs, -output, hdfs:///no_dups, -mapper, dedup_mapper.py, -reducer, dedup_reducer.py, -file deduplication.py timber.py dedup_mapper.py dedup_reducer.py signature_v4.py], }, }], } 

最後,我在主節點上看到的錯誤:

2013-04-26 16:43:48,116 ERROR org.apache.hadoop.streaming.StreamJob (main): Unrecognized option: -file deduplication.py timber.py dedup_mapper.py dedup_reducer.py signature_v4.p 

奇怪的是,錯誤日誌列出了可用的選項,並且其中包含-file。有其他人看過這個問題嗎?

更多的日誌:

2013-04-26T16:43:46.105Z INFO Fetching jar file. 

2013-04-26T16:43:47.609Z INFO Working dir /mnt/var/lib/hadoop/steps/9 

2013-04-26T16:43:47.609Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java -cp /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-core-1.0.3.jar:/home/hadoop/hadoop-tools.jar:/home/hadoop/hadoop-tools-1.0.3.jar:/home/hadoop/hadoop-core.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/* -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/9 -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/9/tmp -Djava.library.path=/home/hadoop/native/Linux-amd64-64 org.apache.hadoop.util.RunJar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input hdfs:///logs -output hdfs:///no_dups -mapper dedup_mapper.py -reducer dedup_reducer.py -file deduplication.py timber.py dedup_mapper.py dedup_reducer.py signature_v4.py 

2013-04-26T16:43:48.611Z INFO Execution ended with ret val 1 

2013-04-26T16:43:48.612Z WARN Step failed with bad retval 

回答

0

該錯誤出現的原因是因爲整個命令被解釋爲單個命令選項。

解決的辦法是添加的命令選項,然後像這樣的論點:

args.add("-file"); 
args.add("myfile.txt"); 

如果要添加多個文件,那你就去做這樣的:

args.add("-file"); 
args.add("myfile.txt"); 
args.add("-file"); 
args.add("myfile2.txt"); 

如果您只需在一個參數中將文件作爲列表提供,那麼整行將被解釋爲文件名,並且可能會拋出錯誤。