2016-06-09 12 views
1

我有一個SQL數據框df1具有以下內容裏面一個數據幀:星火1.5.2:訪問另一個數據框

id value 
    10 100 
    20 200 

現在我有另一個數據框df2,看起來像這樣:

id old_value 
10 800 
20 200 

現在我想根據df1的內容更新df2,如:

val df3 = df2.withColumn('new_value' udf_function(col(id), col(old_value)) 

其中udf_function被定義爲:

val udf_function = udf((id: Integer, value:Integer) => { 
         df1[id] - value // pseudo code 
}) 

如何執行df1[id]上面的UDF函數內?我期望看到df3被創建爲:

id old_value new_value 
10 800  700 
20 200  0 

回答

2

您不能在另一個DataFrame轉換中調用DataFrame。你唯一的解決方案是加入id創建一個新的DataFrame,然後你可以打電話給你的udf。以下示例實際上對這些接頭列使用了簡單操作:

scala> val df1 = Seq((10, 100), (20, 200)).toDF("id", "value") 
// df1: org.apache.spark.sql.DataFrame = [id: int, value: int] 

scala> val df2 = Seq((10, 800), (20, 200)).toDF("id", "old_value") 
// df2: org.apache.spark.sql.DataFrame = [id: int, old_value: int] 

scala> val df3 = df2.join(df1, df1("id") === df2("id")).drop(df1("id")).withColumn("new_value", $"value" - $"old_value") 
// df3: org.apache.spark.sql.DataFrame = [id: int, old_value: int, value: int, new_value: int] 

scala> df3.show() 
// +---+---------+-----+---------+             
// | id|old_value|value|new_value| 
// +---+---------+-----+---------+ 
// | 10|  800| 100|  -700| 
// | 20|  200| 200|  0| 
// +---+---------+-----+---------+ 

scala> val df3 = df2.join(df1, df1("id") === df2("id")).drop(df1("id")).withColumn("new_value", $"old_value" - $"value") 
// df3: org.apache.spark.sql.DataFrame = [id: int, old_value: int, value: int, new_value: int] 

scala> df3.show() 
// +---+---------+-----+---------+ 
// | id|old_value|value|new_value| 
// +---+---------+-----+---------+ 
// | 10|  800| 100|  700| 
// | 20|  200| 200|  0| 
// +---+---------+-----+---------+