2017-04-13 25 views
-4
2, cornflakes, Regular,General Mills, 12  
3, cornflakes, Mixed Nuts, Post, 14 
4, chocolate syrup, Regular, Hersheys, 5 
5, chocolate syrup, No High Fructose, Hersheys, 8 
6, chocolate syrup, Regular, Ghirardeli, 6 
7, chocolate syrup, Strawberry Flavor, Ghirardeli, 7 

腳本計數的小組通過對多個列,並且得到原始數據集

data_grp = GROUP data BY (item, type); 
data_cnt = FOREACH data_grp GENERATE FLATTEN (group) AS(item, type), count(data) as total; 
filter_data = FILTER data_cnt BY total < 2; 

我現在需要將原始數據與應用的過濾器和 我所需的輸出是:

4, chocolate syrup, Regular, Hersheys, 5 
6, chocolate syrup, Regular, Ghirardeli, 6 
+1

人,寫清楚你的問題...... –

回答

0

filter_data會給你chocolate syrup, Regular。加入帶有item的原始數據集的filter_data,鍵入並獲得所需的結果。

data_grp = GROUP data BY (item, type); 
data_cnt = FOREACH data_grp GENERATE FLATTEN (group) AS(item, type), COUNT(data) as total; 
filter_data = FILTER data_cnt BY total < 2; 
o_data = JOIN data BY (item,type),filter_data BY ($0,$1); 
final_data = FOREACH o_data GENERATE $0..$4; 
DUMP final_data; 
+0

試過之前,給人一個錯誤:未能生成邏輯計劃 – Venkat

+0

。嵌套異常:org.apache.pig.backend.executionengine.ExecException:錯誤1070:無法使用imports解析計數:[,java.lang。,org.apache.pig.builtin。,org.apache.pig.impl.builtin 。] – Venkat

+0

讀取爲data_cnt而非count – Venkat

相關問題