2013-04-03 28 views
0

我有一個非常簡單的2列數據,一張雙人牀和一個chararray:豬羣中的簡單AVG?

user1 234.43 
user1 432.23 
user2 4321.213 
etc. 

我想組用戶,然後計算他們的雙打的平均值。怎麼樣?我需要「GROUP * ALL」嗎?我試圖按照例2 http://wiki.apache.org/pig/PigOverview,但它不適合我。

selfReportsAndDiscrepancies = FOREACH discrepancies1 GENERATE discrepancy,selfReportedText; 
perDiscrepancy = GROUP selfReportsAndDiscrepancies BY selfReportedText; 

allDiscrep = group perDiscrepancy all; 

means = FOREACH allDiscrep GENERATE AVG(perDiscrepancy.discrepancy); 

DUMP means; 
DESCRIBE means; 

給我:

2013-04-02 17:54:06,611 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1128: Cannot find field discrepancy in group:chararray,selfReportsAndDiscrepancies:bag{:tuple(discrepancy:double,selfReportedText:chararray)} 

回答

0

我希望我理解正確的,你想要的組平均數的平均值:

VISITS = LOAD 'data' USING PigStorage(' ') AS (user:chararray, number:double); 
USER_VISITS = GROUP VISITS BY user; 
USER_AVG = FOREACH USER_VISITS GENERATE group AS user, AVG(VISITS.number) AS average; 
ALL_AVG = GROUP USER_AVG ALL; 
OVERALL_AVG = FOREACH ALL_AVG GENERATE AVG(USER_AVG.average); 
DUMP OVERALL_AVG; 

結果:

(2327.2715) 
+0

謝謝。我有一個錯誤(我有一個UDF是附加字符串,它應該是雙打)。這確實有用。 – dranxo