1
我試圖實現以下解決方案:window functionSpark,Hive SQL - 實現窗口函數?
我有以下DF:
+------------+----------------------+-------------------+
|increment_id|base_subtotal_incl_tax| eventdate|
+------------+----------------------+-------------------+
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 1570.0000|2015-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
+------------+----------------------+-------------------+
我試圖運行一個窗口功能:
WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("eventdate").desc());
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line
.filter("rank <= 2")
.show();
我想要獲得每個用戶的最後兩個條目(最後一個是最後一個日期,但由於它按日期降序排列,前兩行):
+------------+----------------------+-------------------+
|increment_id|base_subtotal_incl_tax| eventdate|
+------------+----------------------+-------------------+
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2016-06-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
+------------+----------------------+-------------------+
,但我得到這個:
+------------+----------------------+-------------------+----+
|increment_id|base_subtotal_incl_tax| eventdate|rank|
+------------+----------------------+-------------------+----+
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
+------------+----------------------+-------------------+----+
我缺少什麼?
謝謝!這完美的作品!所以它也似乎是最初的一個也適用於現實生活中的數據,因爲時間戳會有所不同。 :) –