我有一個RDD [行]有每行的以下數據星火:轉換RDD [行]到數據幀,其中行中的一列是一個列表
[guid, List(peopleObjects)]
["123", List(peopleObjects1, peopleObjects2, peopleObjects3)]
我想將其轉換爲一個數據幀
我使用下面的代碼
val personStructureType = new StructType()
.add(StructField("guid", StringType, true))
.add(StructField("personList", StringType, true))
val personDF = hiveContext.createDataFrame(personRDD, personStructureType)
我應該使用不同的數據類型爲我的架構,而不是StringType?
如果我的名單只是它工作的字符串,但是當它是一個列表,我得到以下錯誤
scala.MatchError: List(personObject1, personObject2, personObject3) (of class scala.collection.immutable.$colon$colon)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:295)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:294)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:260)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:445)
at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:445)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:219)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
什麼類型'peopleObject'?如果它是'case class',你能否包含它的定義?更好的辦法是創建你的'RDD'的一些示例代碼。 –