2015-04-23 77 views
1

我正在學習hadoop,並且編寫了map/reduce步驟來處理我的一些avro文件。我認爲我遇到的問題可能是由於我的hadoop安裝。我試圖在我的筆記本電腦上以獨立模式進行測試,而不是在分佈式集羣上進行測試。hadoop streaming,使用-libjars包含jar文件

這裏是我的bash調用運行作業:

#!/bin/bash 

reducer=/home/hduser/python-hadoop/test/reducer.py 
mapper=/home/hduser/python-hadoop/test/mapper.py 
avrohdjar=/home/hduser/python-hadoop/test/avro-mapred-1.7.4-hadoop1.jar 
avrojar=/home/hduser/hadoop/share/hadoop/tools/lib/avro-1.7.4.jar 


hadoop jar ~/hadoop/share/hadoop/tools/lib/hadoop-streaming* \ 
    -D mapreduce.job.name="hd1" \ 
    -libjars ${avrojar},${avrohdjar} \ 
    -files ${avrojar},${avrohdjar},${mapper},${reducer} \ 
    -input ~/tmp/data/* \ 
    -output ~/tmp/data-output \ 
    -mapper ${mapper} \ 
    -reducer ${reducer} \ 
    -inputformat org.apache.avro.mapred.AvroAsTextInputFormat 

而這裏的輸出:

15/04/23 11:02:54 INFO Configuration.deprecation: session.id is 
deprecated. Instead, use dfs.metrics.session-id 
15/04/23 11:02:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 
15/04/23 11:02:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
15/04/23 11:02:54 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/home/hduser/tmp/mapred/staging/hduser1337717111/.staging/job_local1337717111_0001 
15/04/23 11:02:54 ERROR streaming.StreamJob: Error launching job , bad input path : File does not exist: hdfs://localhost:54310/home/hduser/hadoop/share/hadoop/tools/lib/avro-1.7.4.jar 
Streaming Command Failed! 

我已經嘗試了很多不同的修復,但不知道什麼嘗試下一個。由於某些原因,hadoop找不到-libjars指定的jar文件。另外,我已經成功運行了發佈here的wordcount示例,因此我的hadoop安裝或配置對此非常有效。謝謝!

編輯 這裏是我的內容變化HDFS-site.xml中

<property> 
    <name>dfs.replication</name> 
    <value>1</value> 
    <description>Default block replication. 
    The actual number of replications can be specified when the file is created. 
    The default is used if replication is not specified in create time. 
    </description> 
</property> 

這裏是核心的site.xml

<property> 
    <name>hadoop.tmp.dir</name> 
    <value>/home/hduser/tmp</value> 
    <description>A base for other temporary directories.</description> 
</property> 

<property> 
    <name>fs.default.name</name> 
    <value>hdfs://localhost:54310</value> 
    <description>The name of the default file system. A URI whose 
    scheme and authority determine the FileSystem implementation. The 
    uri's scheme determines the config property (fs.SCHEME.impl) naming 
    the FileSystem implementation class. The uri's authority is used to 
    determine the host, port, etc. for a filesystem.</description> 
</property> 

回答

3

你的集羣分佈式運行模式。它試圖在下面的路徑中查找輸入,並且該路徑不存在。

hdfs://localhost:54310/home/hduser/hadoop/share/hadoop/tools/lib/avro-1.7.4.jar 
+0

呵呵,呵呵。我遵循本教程(最佳答案):http://askubuntu.com/questions/144433/how-to-install-hadoop如何將其更改爲非分佈式模式?另外,有什麼方法來檢查我在哪個模式下運行? –

+1

你可以檢查你的core-site.xml和hdfs-site.xml並在這裏發佈。它將指向指向hdfs://的文件系統。如果你想獨立運行,它應該是file:/// ...檢查一次。 http://stackoverflow.com/questions/29721623/how-to-find-installation-mode-of-hadoop-2-x/29721955#29721955 –

+0

將hdfs:// ....更改爲file:/// fixed它。現在我有新的錯誤需要解決。謝謝! –