2013-12-15 39 views
1

我想在羣集中運行一個使用MRJob的python腳本,其中我沒有管理權限並且粘貼了下面的錯誤。我認爲正在發生的事情是,該作業嘗試將中間文件寫入默認/ tmp .... dir,並且由於這是一個受保護的目錄,我無權寫入該目錄,因此作業接收到錯誤並且退出。我想知道如何將此tmp輸出目錄位置更改爲我的本地文件系統示例中的某個位置: /home/myusername/some_path_in_my_local_filesystem_on_the_cluster,基本上我想知道我必須通過哪些附加參數來將中間輸出位置從/ tmp/...到我有寫作許可的地方。使用MRJob更改Mapreduce中間輸出位置

我調用我的腳本:

python myscript.py input.txt -r hadoop > output.txt 

錯誤:

no configs found; falling back on auto-configuration 
    no configs found; falling back on auto-configuration 
    creating tmp directory /tmp/13435.1.all.q/mr_word_freq_count.myusername.20131215.004905.274232 
    writing wrapper script to /tmp/13435.1.all.q/mr_word_freq_count.myusername.20131215.004905.274232/setup-wrapper.sh 
    STDERR: mkdir: org.apache.hadoop.security.AccessControlException: Permission denied: user=myusername, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x 
    Traceback (most recent call last): 
     File "/home/myusername/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module> 
     MRWordFreqCount.run() 
     File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run 
     mr_job.execute() 
     File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute 
     super(MRJob, self).execute() 
     File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute 
     self.run_job() 
     File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/launch.py", line 207, in run_job 
     runner.run() 
     File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/runner.py", line 458, in run 
     self._run() 
     File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 236, in _run 
     self._upload_local_files_to_hdfs() 
     File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs 
     self._mkdir_on_hdfs(self._upload_mgr.prefix) 
+0

是你確定你正在調用hadoop作業,因爲afaik,這不是你如何調用hadoop流作業。 – aa8y

回答

0

你運行mrjob爲 「本地」 的工作,或者試圖Hadoop集羣上運行呢?

如果你實際上是在嘗試使用Hadoop上,你可以控制「從零開始」的HDFS位置使用--base-TMP目錄標誌(其中mrjob將存儲中間文件):

python mr.py -r hadoop -o hdfs:///user/you/output_dir --base-tmp-dir hdfs:///user/you/tmp hdfs:///user/you/data.txt