2017-03-31 270 views
0

使用Apache Pig,我需要一個字段的所有排列,按id字段分組(通過本例中的'title')。輸入的數據是這樣的:apache pig,按值分組的值排列

模式是{chararray,chararray}

(title1, name1) 
(title1, name2) 
(title1, name3) 
(title2, name4) 
(title2, name5) 
(title2, name6) 

我需要TITLE1名字的關係,標題2名關係的所有排列在一個列表中。所需的輸出是:

(name1, name2) 
(name1, name3) 
(name2, name3) 
(name4, name5) 
(name4, name6) 
(name5, name6) 

我發現這個答案的相關How To Find All Possible Permutations From A Bag under apache pig,但我有這樣它限制了排列爲每個標題字段擴展解決方案的困難。

回答

0

做一些更多的搜索,使用這兩個職位後: How To Find All Possible Permutations From A Bag under apache pigPIG: Get all tuples out of a grouped bag使我這個解決方案:

輸入模式是{chararray,chararray}

inpt = foreach input generate $0 as (id:chararray), $1 as (val); 
grp = group inpt by (id); 
id_grp = foreach grp generate group as id, inpt.val as value_bag; 
result = foreach id_grp generate FLATTEN(value_bag) as v1,FLATTEN(value_bag) as v2; 
result = filter result by v1 <= v2; 
result = filter result by v1 != v2;