-1
UDF模式匹配下面給出df
:火花多列和集合元素
val df = spark.createDataFrame(Seq(
(1, 2, 3),
(3, 2, 1)
)).toDF("One", "Two", "Three")
我想編寫一個udf
是需要Three columns
作爲進出;並返回基於新列的最高輸入值類似如下:
import org.apache.spark.sql.functions.udf
def udfScoreToCategory=udf((One: Int, Two: Int, Three: Int): Int => {
cols match {
case cols if One > Two && One > Three => 1
case cols if Two > One && Two > Three => 2
case _ => 0
}}
這將是有趣的,看看如何與vector type
作爲輸入做類似:
import org.apache.spark.ml.linalg.Vector
def udfVectorToCategory=udf((cols:org.apache.spark.ml.linalg.Vector): Int => {
cols match {
case cols if cols(0) > cols(1) && cols(0) > cols(2) => 1,
case cols if cols(1) > cols(0) && cols(1) > cols(2) => 2
case _ => 0
}})
問題是如何將多列傳遞給'udf'並根據'invalid syntax'示例執行模式匹配 –