2016-12-15 83 views
2

,我有以下格式的名稱tuppleSlides一個二維表:保存2D名單成數據幀階火花

List(List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7)) 

我已經創建了下面的模式:

val schema = StructType(
      Array(
      StructField("1", IntegerType, true), 
      StructField("2", IntegerType, true), 
      StructField("3", IntegerType, true), 
      StructField("4", IntegerType, true), 
      StructField("5", IntegerType, true), 
      StructField("6", IntegerType, true), 
      StructField("7", IntegerType, true), 
      StructField("8", IntegerType, true), 
      StructField("9", IntegerType, true), 
      StructField("10", IntegerType, true)) 
     ) 

和我創建一個這樣的數據幀:

val tuppleSlidesDF = sparkSession.createDataFrame(tuppleSlides, schema) 

但它甚至不會編譯。我想如何正確地做到這一點?

謝謝。

回答

3

你需要創建一個數據幀之前的2D列表轉換爲RDD [行]對象:

import org.apache.spark.sql._ 
import org.apache.spark.sql.types._ 

val rdd = sc.parallelize(tupleSlides).map(Row.fromSeq(_)) 

sqlContext.createDataFrame(rdd, schema) 
# res7: org.apache.spark.sql.DataFrame = [1: int, 2: int, 3: int, 4: int, 5: int, 6: int, 7: int, 8: int, 9: int, 10: int] 

火花2.X還要注意,sqlContext被替換火花

spark.createDataFrame(rdd, schema) 
# res1: org.apache.spark.sql.DataFrame = [1: int, 2: int ... 8 more fields] 
+1

哈,這裏我寫'toTuple10','fromSeq'是要走的路 –