2017-09-13 54 views
0

我試圖使用自定義模式讀取使用PySpark拼花文件組,但它給了AttributeError的:「StructField」對象有沒有屬性' _get_object_id'錯誤。AttributeError的:「StructField」對象有沒有屬性「_get_object_id」:用自定義模式加載拼花文件

這裏是我的示例代碼:

import pyspark 
from pyspark.sql import SQLContext, SparkSession 
from pyspark.sql import Row 
import pyspark.sql.functions as func 
from pyspark.sql.types import * 

sc = pyspark.SparkContext() 
spark = SparkSession(sc) 
sqlContext = SQLContext(sc) 

l = [('1',31200,'Execute',140,'ABC'),('2',31201,'Execute',140,'ABC'),('3',31202,'Execute',142,'ABC'), 
    ('4',31103,'Execute',149,'DEF'),('5',31204,'Execute',145,'DEF'),('6',31205,'Execute',149,'DEF')] 
rdd = sc.parallelize(l) 
trades = rdd.map(lambda x: Row(global_order_id=int(x[0]), nanos=int(x[1]),message_type=x[2], price=int(x[3]),symbol=x[4])) 
trades_df = sqlContext.createDataFrame(trades) 
trades_df.printSchema() 
trades_df.write.parquet('trades_parquet') 

trades_df_Parquet = sqlContext.read.parquet('trades_parquet') 
trades_df_Parquet.printSchema() 

# The schema is encoded in a string. 
schemaString = "global_order_id message_type nanos price symbol" 

fields = [StructField(field_name, StringType(), True) for field_name in schemaString.split()] 
schema = StructType(fields) 

trades_df_Parquet_n = spark.read.format('parquet').load('trades_parquet',schema,inferSchema =False) 
#trades_df_Parquet_n = spark.read.parquet('trades_parquet',schema) 
trades_df_Parquet_n.printSchema() 

任何一個可以請幫我同意你的建議。

回答

0

指定schema,所以它知道它不是format選項的名稱:

Signature: trades_df_Parquet_n.load(path=None, format=None, schema=None, **options) 

你得到:

trades_df_Parquet_n = spark.read.format('parquet').load('trades_parquet',schema=schema, inferSchema=False) 
+0

謝謝Marie..It工作.. – Srikant

相關問題