2016-07-15 24 views
3

以下程序拋出錯誤名稱「火花」是沒有定義pyspark程序拋出名爲「火花」是沒有定義

Traceback (most recent call last): 
File "pgm_latest.py", line 232, in <module> 
    sconf =SparkConf().set(spark.dynamicAllocation.enabled,true)  
     .set(spark.dynamicAllocation.maxExecutors,300)   
     .set(spark.shuffle.service.enabled,true)  
     .set(spark.shuffle.spill.compress,true) 
NameError: name 'spark' is not defined 
spark-submit --driver-memory 12g --master yarn-cluster --executor-memory 6g --executor-cores 3 pgm_latest.py 

代碼

#!/usr/bin/python 
import sys 
import os 
from datetime import * 
from time import * 
from pyspark.sql import * 
from pyspark 
import SparkContext 
from pyspark import SparkConf 

sc = SparkContext() 
sqlCtx= HiveContext(sc) 

sqlCtx.sql('SET spark.sql.autoBroadcastJoinThreshold=104857600') 
sqlCtx.sql('SET Tungsten=true') 
sqlCtx.sql('SET spark.sql.shuffle.partitions=500') 
sqlCtx.sql('SET spark.sql.inMemoryColumnarStorage.compressed=true') 
sqlCtx.sql('SET spark.sql.inMemoryColumnarStorage.batchSize=12000') 
sqlCtx.sql('SET spark.sql.parquet.cacheMetadata=true') 
sqlCtx.sql('SET spark.sql.parquet.filterPushdown=true') 
sqlCtx.sql('SET spark.sql.hive.convertMetastoreParquet=true') 
sqlCtx.sql('SET spark.sql.parquet.binaryAsString=true') 
sqlCtx.sql('SET spark.sql.parquet.compression.codec=snappy') 
sqlCtx.sql('SET spark.sql.hive.convertMetastoreParquet=true') 

## Main functionality 
def main(sc): 

    if name == 'main': 

     # Configure OPTIONS 
     sconf =SparkConf() \ 
      .set("spark.dynamicAllocation.enabled","true")\ 
      .set("spark.dynamicAllocation.maxExecutors",300)\ 
      .set("spark.shuffle.service.enabled","true")\ 
      .set("spark.shuffle.spill.compress","true") 

sc =SparkContext(conf=sconf) 

# Execute Main functionality 

main(sc) 
sc.stop() 
+0

您追蹤不符合您的代碼......你沒有報價'spark.dynamicAllocation.enabled',例如,所以' spark'沒有被定義爲python變量 –

回答

1

我認爲你正在使用舊火花版本比2。 X。

,而不是這個

spark.createDataFrame(..) 

使用下面

> df = sqlContext.createDataFrame(...) 
相關問題