Spark Hive - 帶窗口函數的UDFArgumentTypeException？

我有以下DF：Spark Hive - 帶窗口函數的UDFArgumentTypeException？

+------------+----------------------+-------------------+         
|increment_id|base_subtotal_incl_tax|   eventdate|         
+------------+----------------------+-------------------+         
|  1086|   14470.0000|2016-06-14 09:54:12|         
|  1086|   14470.0000|2016-06-14 09:54:12|         
|  1086|   14470.0000|2015-07-14 09:54:12|         
|  1086|   14470.0000|2015-07-14 09:54:12|         
|  1086|   14470.0000|2015-07-14 09:54:12|         
|  1086|   14470.0000|2015-07-14 09:54:12|         
|  1086|    1570.0000|2015-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|         
+------------+----------------------+-------------------+

我想運行一個窗口功能：

WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("eventdate").desc()); 
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line 
     .filter("rank <= 2") 
     .show();

我想要得到的是最後兩個條目（最後爲最新的日期，但因爲它是由下降，前兩行）爲每個用戶下令：

+------------+----------------------+-------------------+         
|increment_id|base_subtotal_incl_tax|   eventdate|         
+------------+----------------------+-------------------+         
|  1086|   14470.0000|2016-06-14 09:54:12|         
|  1086|   14470.0000|2016-06-14 09:54:12| 
|  5555|   14470.0000|2014-07-14 09:54:12|         
|  5555|   14470.0000|2014-07-14 09:54:12|          
+------------+----------------------+-------------------+

，但我得到這個：

+------------+----------------------+-------------------+----+ 
|increment_id|base_subtotal_incl_tax|   eventdate|rank|        
+------------+----------------------+-------------------+----+        
|  5555|   14470.0000|2014-07-14 09:54:12| 1|        
|  5555|   14470.0000|2014-07-14 09:54:12| 1|        
|  5555|   14470.0000|2014-07-14 09:54:12| 1|        
|  5555|   14470.0000|2014-07-14 09:54:12| 1|        
|  1086|   14470.0000|2016-06-14 09:54:12| 1|        
|  1086|   14470.0000|2016-06-14 09:54:12| 1|        
+------------+----------------------+-------------------+----+

我錯過了什麼？

[老] - 原來，我有一個錯誤，這是目前解決：

WindowSpec window = Window.partitionBy(df.col("id")); 
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line 
     .filter("rank <= 2") 
     .show();

然而這會返回一個錯誤Exception in thread "main" org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected.對於上面標有註釋的行。我錯過了什麼？這個錯誤是什麼意思？謝謝！

來源

2016-07-20 lte__

rank窗函數需要與orderBy例如一個窗口，子句：

WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("payment"));

如果沒有一個順序是根本沒有意義的，因此，該錯誤。

來源

2016-07-20 08:17:18 zero323

謝謝！我會接受你的回答，但更新了我的問題。如果你也可以幫助我，我會非常感激。 –

Spark Hive - 帶窗口函數的UDFArgumentTypeException？

回答

相關問題