在完整連接豬拉丁文中需要捨棄空值的幫助。下面是兩組數據:在PIG完全外連接後捨去空值
答:
(BOS,2)
(BUR,81)
(LAS,8)
B:
(BUR,56)
(EWR,2)
(LAS,88)
全外後加入: C:
(BOS,2,,)
(BUR,81,BUR,56)
(,,EWR,2)
(LAS,8,LAS,88)
我需要在輸出格式如下:
(BOS,2)
(BUR,137)
(EWR,2)
(LAS,96)
嘗試了不同的組合,平鋪,bagtotuple ......但無法找出解決方案。非常感謝您的幫助。
airline = load '/demo/data/airline/airline.csv' using PigStorage(',') as (Origin: chararray, Dest: chararray);
traffic_in = GROUP airline by Origin;
traffic_in_count= FOREACH traffic_in generate group as Origin , COUNT(airline) as count ;
traffic_out = GROUP airline by Dest;
traffic_out_count = FOREACH traffic_out generate group as Dest ,COUNT (airline) as count;
traffic_top = JOIN traffic_in_count by Origin FULL OUTER , traffic_out_count by Dest ;
請分享你豬腳本。似乎你可以使用cogroup,所以SUM - 你嘗試過嗎? – Mzf
airline = load'/demo/data/airline/airline.csv'使用PigStorage(',') as(Origin:chararray,Dest:chararray); \t \t \t \t \t \t \t \t traffic_in = GROUP用Origin航空公司; traffic_in_count = FOREACH traffic_in生成組爲原產地,COUNT(航空公司)爲計數; traffic_out =通過目的地的GROUP航空公司; traffic_out_count = FOREACH traffic_out生成組爲Dest,COUNT(航空公司)爲計數; traffic_top =通過Origin加入traffic_in_count FULL OUTER,Dest的traffic_out_count; ---請原諒我,無法格式化代碼 - – Fasahat
以上是我的實際代碼,替換了問題中的別名。 – Fasahat