0
所以基本上我試圖實現的是 - 我有一個有4列(比如說)的表格,並將它公開給DataFrame - DF1。現在我想將DF1的每一行存儲到另一個hive表(基本上DF2,其模式爲 - Column1,Column2,Column3),而column3的值將是' - '分隔的DataFrame DF1行。將列表或RDD的列表轉換爲Spark-Scala中的DataFrame
val df = hiveContext.sql("from hive_table SELECT *")
val writeToHiveDf = df.filter(new Column("id").isNotNull)
var builder : List[(String, String, String)] = Nil
var finalOne = new ListBuffer[List[(String, String, String)]]()
writeToHiveDf.rdd.collect().foreach {
row =>
val item = row.mkString("[email protected]")
builder = List(List("dummy", "NEVER_NULL_CONSTRAINT", "some alpha")).map{case List(a,b,c) => (a,b,c)}
finalOne += builder
}
現在我有finalOne的名單,我想直接或通過RDD轉換成數據幀的列表。
var listRDD = sc.parallelize(finalOne) //Converts to RDD - It works.
val dataFrameForHive : DataFrame = listRDD.toDF("table_name", "constraint_applied", "data") //Doesn't work
錯誤:
java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be cast to org.apache.spark.sql.types.StructType
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:414)
at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:94)
能有人幫助我理解將此轉換爲數據幀的正確途徑。提前感謝您的支持。
什麼模式你希望數據框有,字符串類型的3列或類型的1列數組的元素是結構體(3個字符串)? –