2017-06-15 95 views

回答

1

可以完成火花此使用lag功能。示例腳本展示了它如何完成。請注意,日期必須格式化爲函數的yyyy-mm-dd格式。

import org.apache.spark.sql.functions._ 
import org.apache.spark.sql.expressions.Window 

val df = Seq((1000, "2016-01-19"), (1000, "2016-02-12"), (1000, "2016-02-18"), (1000, "2016-02-04")).toDF("product_id", "date")  
val result = df.withColumn("last_date" ,lag("date", 1).over(Window.partitionBy($"product_id").orderBy($"date"))).withColumn("daysToShipMent", datediff($"date", $"last_date")) 

scala> result.select("product_id", "date", "daysToShipMent").show() 
+----------+----------+--------------+ 
|product_id|  date|daysToShipMent| 
+----------+----------+--------------+ 
|  1000|2016-01-19|   null| 
|  1000|2016-02-04|   16| 
|  1000|2016-02-12|    8| 
|  1000|2016-02-18|    6| 
+----------+----------+--------------+ 
+0

非常感謝你 – joesek