0
我從蜂巢加載以下三個表:計算在連接表總和中的Apache豬
books = LOAD 'books' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (isbn_b: chararray, booktitle: chararray, author: chararray, pubyear: chararray, publisher: chararray, urls: chararray, urlm: chararray, urll: chararray);
users = LOAD 'users' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (id_u: chararray, location: chararray, age: chararray);
ratings = LOAD 'ratings' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (id_r: chararray, isbn_r: chararray, rating: chararray);
和我加入和它們分組如下:
OnISBN = JOIN ratings BY isbn_r, books BY isbn_b;
total = JOIN OnISBN BY id_r, users BY id_u;
loc_group = GROUP total BY location;
當運行指令:
final = FOREACH loc_group GENERATE
group as location,
COUNT(total) as rec_num,
SUM(total.rating) as book_rating_sum;
我收到一個Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
錯誤。我認爲這是因爲在SUM
聲明中我並不正確地指rating
,但這是因爲我是Pig的新手。我想作爲輸出的東西的格式如下:
(location, counts, sum of ratings score over that location)
我知道這確實很小,但我一直在這個戰鬥了一段時間,我堅持。我希望得到一些幫助。
非常感謝!我結束了自己的想法,但忘了更新這個問題!謝謝! –