0
因此,我在hdfs中有以下數據。如何查找豬中重複用戶的數量
user_id, category_id
1, 12344
1, 12344
1, 12345
2, 12345
2, 12345
3, 12344
3, 12344
等等..我想找出重複的用戶每個類別獲得的數..
如此,例如上面..
12344, 2 (because user_id 1 and 3 are repeated users)
12345, 1 (user_id 2 is repeated user.. 1 is not as that user visited just once)
如何在豬做?