0
我有一些數據,看起來像這樣:過濾掉重複的行基於列的子集
ID,DateTime,Category,SubCategory
X01,2014-02-13T12:36:14,Clothes,Tshirts
X01,2014-02-13T12:37:16,Clothes,Tshirts
X01,2014-02-13T12:38:33,Shoes,Running
X02,2014-02-13T12:39:23,Shoes,Running
X02,2014-02-13T12:40:42,Books,Fiction
X02,2014-02-13T12:41:04,Books,Fiction
我想要做的是什麼,只保留每個數據點的一個實例的時間是這樣的(我在時間上並不關心哪一個實例):
ID,DateTime,Category,SubCategory
X01,2014-02-13T12:36:14,Clothes,Tshirts
X02,2014-02-13T12:39:23,Shoes,Running
X02,2014-02-13T12:40:42,Books,Fiction
不幸的是,根據Hive Language Manual,蜂房的DISTINCT
表達工作在這樣做這樣的事情整個表是不是一種選擇:
SELECT DISTINCT(ID, SubCategory),
DateTime,
Category
FROM sometable
我該如何去獲得上面的第二張桌子?提前致謝!