2013-08-30 55 views
0

我有以下PIG腳本:過濾的PIG數據加載兩次?

A = LOAD 'text_a.txt' USING PigStorage(); 
B = LOAD 'text_b.txt' USING PigStorage(); 
SOMETHING = FILTER A $0 matches 'SOMETHING'; 
FOOBAR = FILTER A $0 matches 'FOOBAR'; 

SOMETHING_B = JOIN SOMETHING BY key, B BY $1; 
FOOBAR_B = JOIN FOOBAR BY key, B BY $1; 
TEMP = JOIN SOMETHING_B BY key, FOOBAR_B by key; 
OUT = FOREACH TEMP GENERATE SOMETHING_B::$1 - FOOBAR_B::$1; 
dump OUT; 

當此腳本運行時,它看起來像在A和B的數據從源讀取兩次?有沒有辦法阻止它被第二次讀取?

+0

你嘗試使用EXPLAIN命令顯示的執行計劃,看看數據是否真正讀懂了兩次? [link](http://pig.apache.org/docs/r0.10.0/test.html#explain) –

+0

立即運行EXPLAIN。現在試圖弄清楚現在閱讀EXPLAIN結果 – e90jimmy

回答

0

首先,在腳本末尾有「解釋結果」 ,以確定數據是否被讀取兩次。

看着烏爾腳本dosent貌似A,B被稱爲兩次