2013-01-18 27 views
0

我承認這個問題的標題是不明確的。如果有人能看我的問題後改寫它,那將是巨大的。如何避免同樣加入了兩個領域?

反正我有一對是詞語的ID的字段。現在我想用他們的文本替換它們。現在我做了兩聯接和foreach像如下:

WordIDs = LOAD wordID.txt AS (wordID1:long, wordID2:long); 
WordTexts = LOAD wordText.txt AS (wordID:long, wordText:chararray); 

Join1 = JOIN WordIDs BY wordID1, WordTexts BY wordID; 
Replaced1 = FOREACH Join1 GENERATE WordTexts::wordText As wordText1, WordIDs::wordID2; 

Join2 = JOIN Replaced1 BY wordID2, WordTexts BY wordID; 
Replaced2 = FOREACH Join2 GENERATE Replaced1::wordText1 As wordText1, WordTexts::wordText::wordText2; 

有沒有用更少的語句的數量這樣做的任何方式(如一個連接,而不是兩個連接)?

回答

1

我認爲當前的代碼會產生2個獨立的地圖減輕工作,避免用它複製的加入,也不會改變加入語句的數量,但將只使用一個地圖邊加入,只有一個映射精簡工作。代碼應該是一個(我沒有運行它尚未):

WordIDs = LOAD wordID.txt AS (wordID1:long, wordID2:long); 
WordTexts = LOAD wordText.txt AS (wordID:long, wordText:chararray); 

Join1 = JOIN WordIDs BY wordID1, WordTexts BY wordID USING 'replicated'; 
Join2 = JOIN Join1 BY wordID2, WordTexts BY wordID USING 'replicated'; 

Replaced = FOREACH Join2 GENERATE Join1::WordTexts::wordText As wordText1, Join2::wordTexts::wordText as wordText2;