2017-05-04 56 views
0

我從蜂巢加載以下三個表:計算在連接表總和中的Apache豬

books = LOAD 'books' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (isbn_b: chararray, booktitle: chararray, author: chararray, pubyear: chararray, publisher: chararray, urls: chararray, urlm: chararray, urll: chararray); 
users = LOAD 'users' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (id_u: chararray, location: chararray, age: chararray); 
ratings = LOAD 'ratings' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (id_r: chararray, isbn_r: chararray, rating: chararray); 

和我加入和它們分組如下:

OnISBN = JOIN ratings BY isbn_r, books BY isbn_b; 
total  = JOIN OnISBN BY id_r, users BY id_u; 
loc_group = GROUP total BY location; 

當運行指令:

final = FOREACH loc_group GENERATE 
      group as location, 
      COUNT(total) as rec_num, 
      SUM(total.rating) as book_rating_sum; 

我收到一個Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.錯誤。我認爲這是因爲在SUM聲明中我並不正確地指rating,但這是因爲我是Pig的新手。我想作爲輸出的東西的格式如下:

(location, counts, sum of ratings score over that location)

我知道這確實很小,但我一直在這個戰鬥了一段時間,我堅持。我希望得到一些幫助。

回答

1

您的評分​​是隨機數,但SUM需要數字輸入。可以在您的LOAD語句中將其讀取爲數字類型,例如rating: float,或將它投射到您的總結中,例如SUM((float)total.rating)

+0

非常感謝!我結束了自己的想法,但忘了更新這個問題!謝謝! –