2016-11-30 146 views
1

在Dataproc spark集羣中,graphframe包在spark-shell中可用,但不在jupyter pyspark筆記本中。Dataproc:Jupyter pyspark筆記本無法導入graphframes包

Pyspark內核配置:

PACKAGES_ARG='--packages graphframes:graphframes:0.2.0-spark2.0-s_2.11' 

以下是初始化CMD集羣:

gcloud dataproc clusters create my-dataproc-cluster --properties spark.jars.packages=com.databricks:graphframes:graphframes:0.2.0-spark2.0-s_2.11 --metadata "JUPYTER_PORT=8124,INIT_ACTIONS_REPO=https://github.com/{xyz}/dataproc-initialization-actions.git" --initialization-actions gs://dataproc-initialization-actions/jupyter/jupyter.sh --num-workers 2 --properties spark:spark.executorEnv.PYTHONHASHSEED=0,spark:spark.yarn.am.memory=1024m  --worker-machine-type=n1-standard-4 --master-machine-type=n1-standard-4 

回答

3

這是一個老的bug星火殼和紗線,我認爲是固定在SPARK-15782,但顯然這個案子是錯過的。

建議的解決辦法是你的導入之前添加

import os 
sc.addPyFile(os.path.expanduser('~/.ivy2/jars/graphframes_graphframes-0.2.0-spark2.0-s_2.11.jar')) 

0

我發現另一種方法做它添加上Jupyter筆記本工作包:

spark = SparkSession.builder \ 
.appName("Python Spark SQL") \ \ 
.config("spark.jars.packages", "graphframes:graphframes:0.5.0-spark2.1-s_2.11") \ 
.getOrCreate() 
相關問題