0
我想使用Python 3.5而不是Python 2.7在Spark中運行線性迴歸。所以首先我導出了PYSPARK_PHTHON = python3。我收到一個錯誤「No module named numpy」。我試圖「點安裝numpy」,但點不識別設置PYSPARK_PYTHON。我如何問pip安裝3.5的numpy?謝謝你...如何在Spark中爲Python 3.5安裝numpy和pandas?
$ export PYSPARK_PYTHON=python3
$ spark-submit linreg.py
....
Traceback (most recent call last):
File "/home/yoda/Code/idenlink-examples/test22-spark-linreg/linreg.py", line 115, in <module>
from pyspark.ml.linalg import Vectors
File "/home/yoda/install/spark/python/lib/pyspark.zip/pyspark/ml/__init__.py", line 22, in <module>
File "/home/yoda/install/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 21, in <module>
File "/home/yoda/install/spark/python/lib/pyspark.zip/pyspark/ml/param/__init__.py", line 26, in <module>
ImportError: No module named 'numpy'
$ pip install numpy
Requirement already satisfied: numpy in /home/yoda/.local/lib/python2.7/site-packages
$ pyspark
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
17/02/09 20:29:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/09 20:29:20 WARN Utils: Your hostname, yoda-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
17/02/09 20:29:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/02/09 20:29:31 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
____ __
/__/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__/.__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Python version 3.5.2 (default, Nov 17 2016 17:05:23)
SparkSession available as 'spark'.
>>> import site; site.getsitepackages()
['/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.5/dist-packages']
>>>
提示:Spark(可以並且通常會)在*集羣*的計算機上完成其工作。 – 2017-02-10 19:30:22
您將不得不在所用集羣中的所有計算機上安裝numpy lib。即如果您只在本地計算機上使用它,請正確下載並添加該庫。 Spark不應該在乎它的numpy或其他lib已經正確鏈接了。 –
@JackManey它看起來像一個本地模式。 OP只是使用錯誤的點子)Joshua - 使用virtualenv,Anaconda或其他env管理工具是一個好主意。 – zero323