2017-07-18 56 views
0

我正在嘗試創建一個具有按值出現的列號的表。Apache Spark按值編號

id name  date 
1  Wendy  2017-01-01 
2  Alex  2017-01-01 
3  Wendy  2017-01-01 
4  Alex  2016-12-31 

我需要增加一列,是出了名的在特定日期發生。

id name  date   Event 
1  Wendy  2017-01-01 1 
2  Alex  2017-01-01 1 
3  Wendy  2017-01-01 2 
4  Alex  2016-12-31 1 

回答

0

使用selectExprrow_number在SQL語法:

df.selectExpr("id", "name", "date", "row_number() over (partition by name, date order by id) as Event").orderBy("id").show() 

+---+-----+----------+-----+ 
| id| name|  date|Event| 
+---+-----+----------+-----+ 
| 1|Wendy|2017-01-01| 1| 
| 2| Alex|2017-01-01| 1| 
| 3|Wendy|2017-01-01| 2| 
| 4| Alex|2016-12-31| 1| 
+---+-----+----------+-----+