2013-01-11 99 views
1

我正在使用三元運算符來有條件地包含SUM()操作中的值。這是我如何做的。豬中的條件SUM

GROUPED = GROUP ALL_MERGED BY (fld1, fld2, fld3); 

REPORT_DATA = FOREACH GROUPED 
       {  GENERATE group, 
        SUM(GROUPED.fld4 == 'S' ? GROUPED.fld5 : 0) AS sum1, 
        SUM(GROUPED.fld4 == 'S' ? GROUPED.fld5 : (GROUPED.fld5 * -1)) AS sum2; 
       } 

模式爲ALL_MERGED

{ALL_MERGED: {fld1:chararray, fld2:chararray, fld3:chararray, fld4:chararray: fld5:int}} 

當我執行此,它給了我以下錯誤:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: SUM in {group: (fld1:chararray, fld2:chararray, fld3:chararray), ALL_MERGED: {fld1:chararray, fld2:chararray, fld3:chararray, fld4:chararray: fld5:int}} 

我在做什麼錯在這裏?

回答

2

SUM是一款以袋子爲輸入的UDF。你正在做的事情有很多問題,我懷疑它會幫助你回顧對豬的一個很好的參考。我建議Programming Pig,可免費在線。首先,GROUPED有兩個字段:稱爲group的元組和名爲ALL_MERGED的包,這是錯誤消息試圖告訴您的內容。 (我說「嘗試」,因爲豬的錯誤信息往往是相當神祕。)

此外,你不能像你想要的那樣將表達式傳遞給UDF。相反,你必須GENERATE這些字段,然後通過它們。試試這個:

ALL_MERGED_2 = 
    FOREACH ALL_MERGED 
    GENERATE 
     fld1 .. fld5, 
     ((fld4 == 'S') ? fld5 : 0) AS sum_me1, 
     ((fld4 == 'S') ? fld5 : fld5*-1) AS sum_me2; 

GROUPED = GROUP ALL_MERGED_2 BY (fld1, fld2, fld3); 
DATA = 
    FOREACH GROUPED 
    GENERATE 
     group, 
     SUM(ALL_MERGED_2.sum_me1) AS sum1, 
     SUM(ALL_MERGED_2.sum_me2) AS sum2;