0
我正在嘗試生成聚合輸出。問題是所有的數據都會被放入一個reducer中(Filter和Count會產生一個問題)。我如何優化下面的腳本?優化豬腳本
預期輸出: 組,10,2,12,34 ...
data = LOAD '/input/useragents' USING PigStorage('\t') AS (Col1:chararray,Col2:chararray,Col3:chararray,col4:chararray,col5:chararray);
grp1 = GROUP data BY UA PARALLEL 50;
fr1 = FOREACH grp1 {
fltrCol1 = FILTER data BY Col1 == 'Other';
fltrCol2 = FILTER data BY Col2 == 'Other';
fltrCol3 = FILTER data BY Col3 == 'Other';
fltrCol4 = FILTER data BY col4 == 'Other';
fltrCol5 = FILTER data BY col5 == 'Other';
cnt_fltrCol1 = COUNT(fltrCol1);
cnt_fltrCol2 = COUNT(fltrCol2);
cnt_fltrCol3 = COUNT(fltrCol3);
cnt_fltrCol4 = COUNT(fltrCol4);
cnt_fltrCol5 = COUNT(fltrCol5);
GENERATE group,cnt_fltrCol1,cnt_fltrCol2,cnt_fltrCol3,cnt_fltrCol4,cnt_fltrCol5;
}
謝謝亞歷克斯。有一些數據問題,現在它工作正常。我會實現你的想法來優化。 – Arun