我正在使用名爲Point(x:Double,y:Double)的數據類型。我試圖用_c1列和_c2作爲輸入點(),然後創建點值的新列如下Spark 2.1.0不支持UDF架構類型
val toPoint = udf{(x: Double, y: Double) => Point(x,y)}
然後我調用該函數:
val point = data.withColumn("Point", toPoint(watned("c1"),wanted("c2")))
然而,當我宣佈UDF我得到以下錯誤:
java.lang.UnsupportedOperationException: Schema for type com.vividsolutions.jts.geom.Point is not supported
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:733)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$2.apply(ScalaReflection.scala:729)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$2.apply(ScalaReflection.scala:728)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:728)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:671)
at org.apache.spark.sql.functions$.udf(functions.scala:3084)
... 48 elided
我已經正確導入這些數據類型,並使用它之前很多次。現在我試圖將它包含在我的udf的Schema中,但它不能識別它。什麼是包括除標準Int,String,Array等以外的類型的方法...
我在Amazon EMR上使用Spark 2.1.0。
在這裏,我引用了一些相關的問題:
How to define schema for custom type in Spark SQL?
Spark UDF error - Schema for type Any is not supported
在你的例子中'wanted()'是什麼? – himanshuIIITian
@himanshulllTian對不起,這是數據庫的列c1,c2,c3等 – user306603
你有沒有考慮過我的答案? –