我有這樣輸入:如何使用Pig腳本計算兩個字段的組合?
(1, (a, b, c))
(2, (e, f, g))
我預期的輸出是這樣的:
(1, a)
(1, b)
(1, c)
(2, e)
(2, f)
(2, g)
我有這樣輸入:如何使用Pig腳本計算兩個字段的組合?
(1, (a, b, c))
(2, (e, f, g))
我預期的輸出是這樣的:
(1, a)
(1, b)
(1, c)
(2, e)
(2, f)
(2, g)
可能它會幫助你:
A = LOAD 'data' AS (int:a, t1:tuple(t1a:chararray, t1b:chararray,t1c:chararray));
B = FOREACH A GENERATE a,t1.$0,t1.$1,t1.$2;
C = group B by a;
X = COGROUP C BY a, C BY $0;
DUMP X;
你可以試試這個?
A = LOAD 'input.txt' USING PigStorage() AS (f1:int,T:tuple(f2:chararray,f3:chararray,f4:chararray));
B = FOREACH A GENERATE f1,FLATTEN(TOBAG(T.f2,T.f3,T.f4));
DUMP B;
步驟1:加載輸入文件
1的a,b,C
2 E,F,G
作爲
crude_input =負載 '' USING PigStorage()AS(id:int,ip_tuple:tuple(val1:chararray,val2:chararray,val3:chararray));
dump crude_input;
(1,(A,B,C))
(2,(E,F,G))
步驟2:
crude_flatened =的foreach crude_input GENERATE ID, FLATTEN($ 1);
這將生成
(1,A,B,C)
(2,E,F,G)
步驟3:
output_data =的foreach crude_flatened生成ID ,FLATTEN(TOBAG(ip_tuple :: val1,ip_tuple :: val2,ip_tuple :: val3));
(1,A)
(1,B)
(1,C)
(2,E)
(2,F)
( 2,g)
請在您的答案中正確地格式化代碼。 – Tom
似乎它不起作用,輸出仍然是:'(f1,T.f2,T.f3,...)'。 –
我找到了解決方案[這裏](http://stackoverflow.com/questions/11213567/pivot-table-with-apache-pig)。只需要一個'FLATTEN',謝謝。 –