2017-07-14 28 views
0

由integerlist篩選列表我有一個看起來像這樣的列表:lista.csv:使用piglatin

client-id priority client-start assignment 
12345  1   1250125125  13 
1246   3   1250122156  27 
12616  1   1250122351  3 
... 

,我有另一個列表,看上去就像一個向量listb.csv:

125125 
124214 
1246 
125 
... 

我想要做的是篩選所有客戶端的列表,其ID也可以在listb中找到。

我想是這樣的,但它不工作:

raw = LOAD 'lista.csv' USING PigStorage('\t') AS (client-id: int, priority: 
int, client-start: int, assignment: int); 
s4q = LOAD 'listb.csv' USING PigStorage('\t') AS (survs4id: int); 
s4id = FOREACH s4q { 
dd = FILTER raw by (client-id == s4q); 
GENERATE dd; 
} 
DUMP dd; 

任何想法如何解決這一問題?

回答

0

JOIN這兩個關係只得到匹配的記錄。這將作爲一個過濾器。

raw = LOAD 'lista.csv' USING PigStorage('\t') AS (client-id: int, priority: int, client-start: int, assignment: int); 
s4q = LOAD 'listb.csv' USING PigStorage('\t') AS (survs4id: int); 
s4id = JOIN raw BY client-id,s4q BY survs4id; 
dd = FOREACH s4id GENERATE s4id.$0,s4id.$1,s4id.$2,s4id.$3; 
DUMP dd; 
+0

Thx!它確實有效。 – nomilk