2017-03-16 63 views
0

您好我有一個包含字符串和numpy float 64值的元組列表。我想將其更改爲激發數據框。但我收到錯誤。列表和錯誤如下所示。PySpark:無法從列表中創建數據框

enter image description here

這是我的代碼:

schema = StructType([StructField("key", StringType(), True), StructField("value", DoubleType(), True)]) 

coef_df = spark.createDataFrame(coef_list, schema) 

回答

2

正如@ user6910411表明,星火SQL不支持NumPy的類型(還)

這是給你一個稍微簡單的解決方案(也包括評論)

import numpy as np 

data = [ 
    (np.unicode('100912strategy_id'), np.float64(-2.1412)), 
    (np.unicode('10exchange_ud'), np.float64(-1.2412))] 

df = (sc.parallelize(data) 
    .map(lambda x: (str(x[0]), float(x[1]))) 
    .toDF(["key","value"])) 
df.show() 
+-----------------+-------+ 
|    key| value| 
+-----------------+-------+ 
|100912strategy_id|-2.1412| 
| 10exchange_ud|-1.2412| 
+-----------------+-------+