-1

我想以編程方式給出一定數量的字段,並且對於某些字段,選擇一個列並將該字段傳遞給另一個函數,該函數將返回一個字符串case類,字符串。到目前爲止,我有當通過列列表映射時返回兩列Spark SQL Scala

val myList = Seq(("a", "b", "c", "d"), ("aa", "bb", "cc","dd")) 

val df = myList.toDF("col1","col2","col3","col4") 

val fields= "col1,col2"

val myDF = df.select(df.columns.map(c => if (fields.contains(c)) { df.col(s"$c") && someUDFThatReturnsAStructTypeOfStringAndString(df.col(s"$c")).alias(s"${c}_processed") } else { df.col(s"$c") }): _*) 

眼下這讓我異常

org.apache.spark.sql.AnalysisException: cannot resolve '(col1 AND UDF(col1))' due to data type mismatch: differing types in '(col1 AND UDF(col1))' (string and struct< STRING1:string,STRING2:string >) 

我想選擇

COL1 | < col1.String1,col1.String2> | col2 | < col2.String1,col2.String2> | col3 | col4

「a」| <「a1」,「a2」> | 「b」| <「b1」,「b2」> | 「c」| 「d」

+0

你爲什麼低調呢?你介意說爲什麼? –

回答

0

我結束了使用df.selectExpr並綁定了一堆表達式。

 import spark.implicits._ 
     val fields = "col1,col2".split(",") 


     val exprToSelect = df.columns.filter(c => fields.contains(c)).map(c => s"someUDFThatReturnsAStructTypeOfStringAndString(${c}) as ${c}_parsed") ++ df.columns 

     val exprToFilter = df.columns.filter(c => fields.contains(c)).map(c => s"length(${c}_parsed.String1) > 1").reduce(_ + " OR " + _) //error 
     val exprToFilter2 = df.columns.filter(c => fields.contains(c)).map(c => s"(length(${c}_parsed.String1) < 1)").reduce(_ + " AND " + _) //valid 
     val exprToSelectValid = df.columns.filter(c => fields.contains(c)).map(c => s"${c}_parsed.String2 as ${c}") ++ df.columns.filterNot(c => fields.contains(c)) //valid 
     val exprToSelectInValid = Array("concat(" + df.columns.filter(c => fields.contains(c)).map(c => s"${c}_parsed.String1").mkString(", ") + ") as String1") ++ df.columns 

     val parsedDF = df.select(exprToSelect.map { c => expr(s"$c")}: _ *) 

     val validDF = parsedDF.filter(exprToFilter2) 
          .select(exprToSelectValid.map { c => expr(s"$c")}: _ *) 

     val errorDF = parsedDF.filter(exprToFilter) 
           .select(exprToSelectInValid.map { c => expr(s"$c")}: _ *)