豬拉丁文中的計數任務

假設我有一對夫婦列表(id, value)和列表potentialIDs。豬拉丁文中的計數任務

對於potentialIDs中的每一個，我想統計第一個列表中出現ID的次數。

E.g.

couples: 
1 a 
1 x 
2 y 

potentialIDs 
1 
2 
3 

Result: 
1 2 
2 1 
3 0

我試圖做的是，在PigLatin，但它似乎不是小事。

你能給我提示嗎？

來源

2013-07-23 Aslan986

總體規劃是：您可以按id編組夫婦，然後執行COUNT，然後對0123ID進行潛在ID和輸出的左連接。從那裏你可以根據需要進行格式化。代碼應該解釋如何更詳細地做到這一點。

注意：如果你需要我進入更多的細節只是讓我知道，但我認爲這些評論應該解釋發生了什麼很好。

-- B generates the count of the number of occurrences of an id in couple 
B = FOREACH (GROUP couples BY id) 
    -- Output and schema of the group is: 
    -- {group: chararray,couples: {(id: chararray,value: chararray)}} 
    -- (1,{(1,a),(1,x)}) 
    -- (2,{(2,y)}) 

    -- COUNT(couples) counts the number of tuples in the bag 
    GENERATE group AS id, COUNT(couples) AS count ; 

-- Now we want to do a LEFT join on potentialIDs and B since it will 
-- create nulls for IDs that appear in potentialIDs, but not in B 
C = FOREACH (JOIN potentialIDs BY id LEFT, B BY id) 
    -- The output and schema for the join is: 
    -- {potentialIDs::id: chararray,B::id: chararray,B::count: long} 
    -- (1,1,2) 
    -- (2,2,1) 
    -- (3,,) 

    -- Now we pull out only one ID, and convert any NULLs in count to 0s 
    GENERATE potentialIDs::id, (B::count is NULL?0:B::count) AS count ;

爲C的模式和輸出是：

C: {potentialIDs::id: chararray,count: long} 
(1,2) 
(2,1) 
(3,0)

如果你不希望disambiguate operator（的：:)在C，你可以改變GENERATE行：

GENERATE potentialIDs::id AS id, (B::count is NULL?0:B::count) AS count ;

來源

2013-07-23 17:23:19 mr2ert

豬拉丁文中的計數任務

回答

相關問題