裏面我有元組的包裝袋,並且需要被歸到零每袋一個字段。我把這個字段的MIN
放在包裏,並從每個元組中減去那個min。數學袋
可以這樣做而不扁平化?
實際情況稍微複雜一點,因爲我只希望min的一部分元組滿足一定的條件。
下面是一些示例代碼不工作:
data = LOAD 'data.csv' USING PigStorage(',')
AS (x:int, y:int, z:int);
data_grouped = GROUP data BY x;
data_normal = FOREACH data_grouped {
good_data = FILTER data BY y == 0;
smallest_good_z = MIN(good_data.z);
GENERATE data.(x, y, z-smallest_good_z);
}
DESCRIBE data_normal;
rmf data_normal
STORE data_normal INTO 'data_normal' USING PigStorage(',');
和樣品data.csv
:
0,0,1
0,0,2
0,0,3
0,1,0
0,2,-1
1,2,3
1,3,4
1,4,5
1,0,5
請告訴我,我不必組,MIN
,壓平,減,並重新組合!這裏是我現在使用的方法,我想要擺脫:
data = LOAD 'data.csv' USING PigStorage(',') AS
(x:int, y:int, z:int);
data_grouped = GROUP data BY x;
data_n0 = FOREACH data_grouped {
good_data = FILTER data BY y == 0;
smallest_good_z = MIN(good_data.z);
GENERATE FLATTEN(data.(x, y, z)), smallest_good_z AS smz:int;
}
data_n1 = FOREACH data_n0 GENERATE x,y,z-smz;
data_normal = GROUP data_n1 BY x;
哦有一隻貓雙關的問題的標題某處潛伏...:d – TC1