2014-10-10 21 views
0

我想提取不同列的記錄,我該如何實現它?豬:提取的記錄不是由列區分的

例如輸入:

(user1, value1, value2) 
(user1, value3, value4) 
(user2, value5, value6) 
(user3, value7, value8) 
(user4, value9, value10) 
(user4, value11, value12) 

提取已重複第1列的值的記錄後,輸出會是:

(user1, value1, value2) 
(user1, value3, value4) 
(user4, value9, value10) 
(user4, value11, value12) 

非常感謝提前!

回答

0

請讓我知道這是否適合你。出於測試目的,我用值1和值2爲chararray但在實際代碼變化值1和值2 int或長

input.txt 
user1,value1,value2 
user1,value3,value4 
user2,value5,value6 
user3,value7,value8 
user4,value9,value10 
user4,value11,value12 

PigScript 
A = LOAD 'input.txt' USINg PigStorage(',') AS (user:chararray,value1:chararray,value2:chararray); 
B = GROUP A BY user; 
C = FOREACH B GENERATE FLATTEN(A),COUNT(A) AS cnt; 
D = FILTER C BY cnt >1; 
E = FOREACH D GENERATE A::user,A::value1,A::value2; 
DUMP E; 

Output: 
(user1,value1,value2) 
(user1,value3,value4) 
(user4,value9,value10) 
(user4,value11,value12) 
+0

@DanZhu,如果我的回答幫助,請註明這個問題的回答。 – 2014-10-15 13:32:53