0
我有以下數據集:數據標準化生豬腳本
1,11,ab;cd;200
2,22,pq;rs
我想這在輸出:
1,11,ab
1,11,cd
1,11,200
2,22,pq
2,22,rs
如何可以在豬來完成,而無需使用任何UDF?
我有以下數據集:數據標準化生豬腳本
1,11,ab;cd;200
2,22,pq;rs
我想這在輸出:
1,11,ab
1,11,cd
1,11,200
2,22,pq
2,22,rs
如何可以在豬來完成,而無需使用任何UDF?
你可以做這樣的事情:
A = load '....' using PigStorage(',') as (x,y,data : chararray);
SPLT = foreach A generate x, y, FLATTEN(STRSPLIT(data,';'));
X_tmp = foreach SPLT generate $0 as x, $1 as y, FLATTEN(TOBAG($2..$20)) as term; -- pivots the row
X = filter X_tmp by term is not null; -- this removes the extra bag rows when title was split in less than 20 terms
的假設是,你不會有數據串超過20元。如果你有更多,增加它。
試試這個
A = load 'data' using PigStorage(',') as (x,y,data:chararray);
SPLT = foreach A generate x, y, FLATTEN(STRSPLIT(data,';',3)) as (a,b,c);
grp = group SPLT by (x,y);
res = foreach grp generate group, FLATTEN(SPLT);
out = foreach res generate FLATTEN(group), FLATTEN(TOBAG(SPLT::a, SPLT::b, strong textSPLT::c)) as val;
dump out;