加入並在Hadoop中分組Pig

經常看到人們使用group by和join爲同一個問題，假設我有一張學生表和score table，想要查找與相關課程分數相關的student name。我們似乎可以通過使用連接或使用group by來解決此問題？想知道這兩種解決方案的優缺點。在下面發佈數據結構和代碼。謝謝。加入並在Hadoop中分組Pig

table students: 

student ID, student name, student email address 

score table: 

student ID, course ID, score 

student_scores = group students by (studentId) inner, scores by (studentId); 

student_scores = join students by student Id, scores by studentId;

來源

2016-03-13 Lin Ma

的[加入VS豬協同組（可能的複製http://stackoverflow.com/questions/7496029/join-vs-cogroup-in-pig） – rahulbmv

@rahulbmv，很好的參考和投票。 :) 但我正在問組v.s.加入，你指的是共同組？謝謝。 –

@rahulbmv，我也很困惑於評論中的「外鍵」是什麼意思 - 「兩者都需要用鍵作爲外鍵發送所有記錄」，如果你能夠展示一個例子，它會很棒。 –

在豬拉丁手冊冊約Join它說：

Note the following about the GROUP/COGROUP and JOIN operators: 

The GROUP and JOIN operators perform similar functions. GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples. 
The GROUP/COGROUP and JOIN operators handle null values differently (see Nulls and JOIN Operator).

不知道它的優點&缺點，但他們指出錯誤

來源

2016-03-15 11:21:58 Mzf

謝謝Mzf，我的問題是具體如何他們在我的示例中不同。想了解差異。 :) –

嗨Mzf，如果你有任何好的想法，這將是偉大的。 :) –

加入並在Hadoop中分組Pig

回答

相關問題