2012-03-22 87 views
8

我正在嘗試編寫一個拉丁腳本來拉取我過濾的數據集的數量。無法推斷COUNT函數

這裏的腳本至今:

/* scans by title */ 

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
scancount  = FOREACH productscans GENERATE COUNT($0); 
DUMP scancount; 

出於某種原因,我得到的錯誤:

Could not infer the matching function for org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an explicit cast.

什麼我錯在這裏做什麼?我假設它與我傳入的字段的類型有關,但似乎無法解決此問題。

TIA, 傑森

回答

14

這是你在找什麼(所有組將一切準備一個袋子,然後計算項目):

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
grouped   = GROUP productscans ALL; 
count   = FOREACH grouped GENERATE COUNT(productscans); 
dump count; 
+2

就是這樣(減去「FOREACH克」應該是「FOREACH分組」) - 謝謝克里斯! – JasonA 2012-03-23 14:02:56

+0

編輯,感謝您的審查 – 2012-03-23 14:32:18

0

也許

/* scans by title */ 

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
scancount  = FOREACH productscans GENERATE COUNT(productscans); 
DUMP scancount; 
+0

感謝傑克 - 不幸的是,沒有運氣。這給了我:'無效的標量投影:productscans:一列需要從一個關係投影,它被用作標量' – JasonA 2012-03-22 20:25:25

4

COUNT 需要前面的GROUP ALL語句用於全局計數和GROUP BY語句用於組計數。

您可以使用以下任何:

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
grouped   = GROUP productscans ALL; 
count   = FOREACH grouped GENERATE COUNT(productscans); 
DUMP scancount; 

或者

scans   = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); 
productscans = FILTER scans BY (title MATCHES 'proactiv'); 
grouped   = GROUP productscans ALL; 
count   = FOREACH grouped GENERATE COUNT($1); 
DUMP scancount;