2016-11-16 24 views
0

所以我有豬下面的數據結構描述後:如何從一袋元組中提取截然不同的東西?

-------------------------------------------------------------------------------------------------------------------------------------------------------- 
| summed_hours_and_miles_by_driver  | group:int  | :bag{:tuple(driver_name:chararray)}    | total_hours:long  | total_miles:long  | 
-------------------------------------------------------------------------------------------------------------------------------------------------------- 
|          | 27   | {(Mark Lochbihler), ..., (Mark Lochbihler)}  | 220     | 11006    | 
-------------------------------------------------------------------------------------------------------------------------------------------------------- 

的想法是,驅動程序名稱(馬克Lochbihler)的元組的包被複制多次。 我怎樣才能限制它在一個單一的名字那裏有像SQL中的DISTINCT?

回答

0

使用Distinct,假設A是你的關係是這樣的

summed_hours_and_miles_by_driver = FOREACH grp GENERATE 
             group, 
             org.apache.pig.builtin.Distinct(A.driver_name), 
             total_hours, 
             total_miles;