2015-10-20 26 views
0

我有問題,總結2個日誌文件。如何在豬中總結2個日誌文件

例如文件:

  1. 文件-1

    ID用戶視圖

    1 AAA 2

    2 BBB 5

    3 CCC 9

  2. 文件-2

    ID用戶視圖地址

    1 AAA 5 XXX

    2 BBB 2 YYY

    6 FFF 4 ZZZ

我想要總結兩種文件按id和求和(查看),我希望輸出:

輸出:

id user view address 
1 AAA 7 XXX 
2 BBB 7 YYY 

我應該嘗試代碼加入兩個文件,但我不總結兩個文件:

我的代碼:

inputdata = LOAD '/user/hdfs/tes/part-1' AS (
    id:chararray, 
    user:chararray, 
    view:int 
); 


inputdata2 = LOAD '/user/hdfs/tes/part-2' AS (
    id:chararray, 
    user:chararray, 
    view:int, 
    address:chararray 
); 


joined = JOIN inputdata BY id LEFT OUTER, inputdata2 by id; 

outputlist = FOREACH joined { 

     GENERATE 
     inputdata::id, 
     inputdata::user, 
     --sum(inputdata2::view), 
     inputdata2::address; 


} 

dump outputlist; 

IAM的問題,如何在兩個日誌文件總結看法? ?

謝謝。

回答

2

在foreach循環中獲取連接結果並總結視圖值。這將起作用。

A = LOAD 'file1.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int);     
B = LOAD 'file2.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int,d:chararray);  
C = JOIN A by a,B by a;                               
D = FOREACH C GENERATE A::a as id,A::b as user,A::c + B::c as view,B::d as address; 

輸出:

(1,AAA,7,XXX) 
(2,BBB,7,YYY) 
+0

謝謝。 Vignesh先生你代碼成功。非常感謝.. –

+0

我很高興它的工作。如果它解決了您的問題,請接受答案。 –

相關問題