2016-02-03 80 views
0

下面是我輸入如何從豬的分組關係中找到最大值及其相關值?

$ cat people.csv 
Steve,US,M,football,6.5 
Alex,US,M,football,5.5 
Ted,UK,M,football,6.0 
Mary,UK,F,baseball,5.5 
Ellen,UK,F,football,5.0 

我需要組根據國家我的數據。

people = LOAD 'people.csv' USING PigStorage(',') AS (name:chararray,country:chararray,gender:chararray, sport:chararray,height:float); 
grouped = GROUP people BY country; 

現在我必須從分組數據中找到人的最大高度和他的詳細信息。

所以我想下面

a = FOREACH grouped GENERATE group AS country, MAX(people.height) as height, people.name as name; 

這給輸出

(UK,6.0,{(Ellen),(Mary),(Ted)}) 
(US,6.5,{(Alex),(Steve)}) 

但我需要我的輸出應該是

(UK,6.0,Ted) 
(US,6.5,Steve) 

可能有人請幫助我實現這個?

回答

0

此代碼將幫助您。

按照這個代碼,如果有同一個國家下兩名球員,最大高度,那麼你將獲得這兩個球員詳細

records = LOAD '/home/user/footbal.txt' USING PigStorage(',') AS(name:chararray,country:chararray,gender:chararray,sport:chararray,height:double); 

records_grp = GROUP records BY (country); 

records_each = foreach records_grp generate group as temp_country, MAX(records.height) as max_height; 

records_join = join records by (country,height), records_each by (temp_country,max_height); 

records_output = foreach records_join generate country, max_height, name; 

dump records_output; 

輸出:

(UK,6.0,Ted) 
(US,6.5,Steve) 
+0

它worked..Thanks一個很多! – Sathyaraj