我正在大型集羣上運行Spark程序(爲此,我沒有管理權限)。 numpy
未安裝在工作節點上。因此,我捆綁numpy
與我的計劃,但我得到了以下錯誤:Numpy和靜態鏈接
Traceback (most recent call last):
File "/home/user/spark-script.py", line 12, in <module>
import numpy
File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line 170, in <module>
File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line 13, in <module>
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", line 8, in <module>
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py", line 11, in <module>
File "/usr/local/lib/python2.7/dist-packages/numpy/core/__init__.py", line 6, in <module>
ImportError: cannot import name multiarray
劇本其實很簡單:
from pyspark import SparkConf, SparkContext
sc = SparkContext()
sc.addPyFile('numpy.zip')
import numpy
a = sc.parallelize(numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90]))
print a.collect()
據我所知,因爲numpy
動態加載multiarray.so
依賴的改變發生錯誤,即使我的numpy.zip
文件包含multiarray.so
文件,但以某種方式動態加載不適用於Apache Spark
。爲什麼這樣?另外你怎麼創建一個獨立的帶有靜態鏈接的numpy
模塊?
謝謝。
您能否演示如何創建zip文件? – zero323
@ zero323:'zip -r〜/ numpy.zip/usr/local/lib/python2.7/dist-packages/numpy' – abhinavkulkarni
所以你試圖複製現有的安裝? – zero323