2017-02-10 57 views
0

我正在pyspark在IPython的筆記本做如何將一個模塊的文件夾/tar.gz在Pyspark

export PYSPARK_DRIVER_PYTHON=/usr/local/bin/jupyter 
export PYSPARK_DRIVER_PYTHON_OPTS="notebook--NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880" 
export PYSPARK_PYTHON=/usr/bin/python 

我有一個自定義的UDF功能,它利用一種稱爲模塊如下配置後,添加到節點mzgeohash。但是,我得到模塊未找到錯誤,我想這個模塊可能會在工人/節點中缺失。我試圖添加sc.addpyfile和所有。但是,在這種情況下,從Ipython中添加克隆文件夾或tar.gz python模塊的有效方法是什麼。

回答

0

這裏是我如何做到這一點,基本的想法是你的模塊中創建的所有文件的ZIP並將它傳遞給sc.addPyFile():

import dictconfig 
import zipfile 

def ziplib(): 
    libpath = os.path.dirname(__file__)     # this should point to your packages directory 
    zippath = '/tmp/mylib-' + rand_str(6) + '.zip'  # some random filename in writable directory 
    zf = zipfile.PyZipFile(zippath, mode='w') 
    try: 
     zf.debug = 3            # making it verbose, good for debugging 
     zf.writepy(libpath) 
     return zippath            # return path to generated zip archive 
    finally: 
     zf.close() 

... 
zip_path = ziplib()            # generate zip archive containing your lib        
sc.addPyFile(zip_path)          # add the entire archive to SparkContext 
... 
os.remove(zip_path)           # don't forget to remove temporary file, preferably in "finally" clause 
+0

Dharms - 我想這一點,但仍同樣的錯誤。沒有名爲mzgeohash的模塊 –

相關問題