2013-10-22 55 views
0

我正在使用Mrjob在Hadoop中運行python代碼。我在單個節點羣集上使用CDH軟件包和虛擬機。我mrjob正確運行,當我測試本地代碼,但是當我跑了Hadoop集羣,它拋出一個錯誤:在CDH虛擬機上找不到python mrjob模塊

No module named mrjob

當我刪除「命令」蟒蛇之前命令,我得到了以下信息。

no configs found; falling back on auto-configuration 
no configs found; falling back on auto-configuration 
creating tmp directory /tmp/main_mrjob.cloudera.20131022.180113.820659 
writing wrapper script to /tmp/main_mrjob.cloudera.20131022.180113.820659/setup-wrapper.sh 
STDERR: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName 
STDERR: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName 
STDERR:  at java.net.URLClassLoader$1.run(URLClassLoader.java:202) 
STDERR:  at java.security.AccessController.doPrivileged(Native Method) 
STDERR:  at java.net.URLClassLoader.findClass(URLClassLoader.java:190) 
STDERR:  at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 
STDERR:  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) 
STDERR:  at java.lang.ClassLoader.loadClass(ClassLoader.java:247) 
STDERR: Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit. 
STDERR: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FsShell 
STDERR: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FsShell 
STDERR:  at java.net.URLClassLoader$1.run(URLClassLoader.java:202) 
STDERR:  at java.security.AccessController.doPrivileged(Native Method) 
STDERR:  at java.net.URLClassLoader.findClass(URLClassLoader.java:190) 
STDERR:  at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 
STDERR:  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) 
STDERR:  at java.lang.ClassLoader.loadClass(ClassLoader.java:247) 
STDERR: Could not find the main class: org.apache.hadoop.fs.FsShell. Program will exit. 
Traceback (most recent call last): 
    File "main_mrjob.py", line 17, in <module> 
    MRWordFrequencyCount.run() 
    File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/job.py", line 500, in run 
    mr_job.execute() 
    File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute 
    super(MRJob, self).execute() 
    File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute 
    self.run_job() 
    File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/launch.py", line 207, in run_job 
    runner.run() 
    File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/runner.py", line 458, in run 
    self._run() 
    File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/hadoop.py", line 236, in _run 
    self._upload_local_files_to_hdfs() 
    File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs 
    self._mkdir_on_hdfs(self._upload_mgr.prefix) 
    File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/hadoop.py", line 271, in _mkdir_on_hdfs 
    self.invoke_hadoop(['fs', '-mkdir', path]) 
    File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/fs/hadoop.py", line 104, in invoke_hadoop 
    raise CalledProcessError(proc.returncode, args) 
subprocess.CalledProcessError: Command '['/usr/lib/hadoop-0.20-mapreduce/bin/hadoop', 'fs', '-mkdir', 'hdfs:///user/cloudera/tmp/mrjob/main_mrjob.cloudera.20131022.180113.820659/files/']' returned non-zero exit status 1 

它似乎不能「mkdir」hdfs沒有sudo但與sudo它無法找到mrjob。我真的很困惑....

非常感謝!

回答

0

當使用Cloudera的快速啓動虛擬機,我經歷了同樣的問題。

的解決方案是:

  1. 設置HADOOP_HOME爲 「/ usr/lib中/ Hadoop的」:

    export HADOOP_HOME=/usr/lib/hadoop 
    
  2. 創建符號鏈接到Hadoop的streaming.jar:

    sudo ln -s /usr/lib/hadoop-mapreduce/hadoop-streaming.jar /usr/lib/hadoop 
    
相關問題