我是Apache Pig的新手。我想將以下輸入分割並拼合成我所需的輸出,例如誰都可以看到該產品。無法避免重複刪除Apache Pig
我輸入:(用戶ID,產品編號)
12345 123456,23456,987653
23456 23456,123456,234567
34567 234567,765678,987653
我所需的輸出:(產品編號,用戶ID)
123456 12345
123456 23456
23456 12345
23456 23456
987653 12345
987653 34567
234567 23456
234567 34567
765678 34567
我的豬腳本:
a = load '/home/hadoopuser/ips' using PigStorage('\t') as (key:chararray, val:chararray);
b = foreach a generate key as ky1, FLATTEN(TOKENIZE(val)) as vl1;
c = group b by vl1;
d = foreach c generate group as vl2, $1 as ky2;
e = foreach d generate vl2, BagToString(ky2) as kyy;
f = foreach e generate vl2 as vl3,FLATTEN(STRSPLIT(kyy,'_')) as ky3;
g = foreach f generate vl3, FLATTEN(TOKENIZE(ky3)) as kk1;
dump g;
我得到了以下輸出消除重複(重複)值,
(23456,12345)
(123456,12345)
(234567,23456)
(765678,34567)
(987653,12345)
我不知道如何解決這個問題。任何人都可以幫我解決這個問題嗎?以及如何以簡單的方式做到這一點?
Hi Balduz,謝謝你的回覆。它的工作正常,我清楚地解釋了這個問題。 –