2014-03-18 30 views
1

我有一個電影數據庫中的以下數據集:豬:從relaltion選擇只記錄,如果它存在於另一個關係

Ratings: UserID, MovieID, Rating 
Movies: MovieID, Genre 

我過濾掉與流派爲「行動」或「戰爭電影「使用:

movie_filter = filter Movies by (genre matches '.*Action.*') OR (genre matches '.*War.*'); 

現在,我必須計算戰爭或動作電影的平均評分。但評級出現在評級文件中。爲此,我使用查詢:

movie_groups = GROUP movie_filter BY MovieID; 

result = FOREACH movie_groups GENERATE Ratings.MovieID, AVG(Ratings.rating); 

然後我將結果存儲在一個目錄位置。但是當我運行該程序時,出現以下錯誤:

Could not infer the matching function for org.apache.pig.builtin.AVG as multiple or none of them fit. Please use an explicit cast. 

任何人都可以告訴我我做錯了什麼嗎?提前致謝。

回答

2

看起來你錯過了一個加入語句,它會加入MovieID列上的兩個數據集(評分&電影)。我嘲笑了一些測試數據,並在下面提供了一些示例代碼。



movie_avg.pig

ratings = LOAD 'movie_ratings.txt' USING PigStorage(',') AS (user_id:chararray, movie_id:chararray, rating:int); 
movies = LOAD 'movie_data.txt' USING PigStorage(',') AS (movie_id:chararray,genre:chararray); 

movies_filter = FILTER movies BY (genre MATCHES '.*Action.*' OR genre MATCHES '.*War.*'); 

movies_join = JOIN movies_filter BY movie_id, ratings BY movie_id; 

movies_cleanup = FOREACH movies_join GENERATE movies_filter::movie_id AS movie_id, ratings::rating as rating; 

movies_group = GROUP movies_cleanup by movie_id; 

data = FOREACH movies_group GENERATE group, AVG(movies_cleanup.rating); 

dump data; 



movie_avg.pig的輸出

(Jarhead,3.0) 
(Platoon,4.333333333333333) 
(Die Hard,3.0) 
(Apocolypse Now,4.5) 
(Last Action Hero,2.0) 
(Lethal Weapon, 4.0) 



movie_data.txt

Scrooged,Comedy 
Apocolypse Now,War 
Platoon,War 
Guess Whos Coming To Dinner,Drama 
Jarhead,War 
Last Action Hero,Action 
Die Hard,Action 
Lethal Weapon,Action 
My Fair Lady,Musical 
Frozen,Animation 



movie_ratings.txt

12345,Scrooged,4 
12345,Frozen,4 
12345,My Fair Lady,5 
12345,Guess Whos Coming To Dinner,5 
12345,Platoon,3 
12345,Jarhead,2 
23456,Platoon,5 
23456,Apocolypse Now,4 
23456,Die Hard,3 
23456,Last Action Hero,2 
34567,Lethal Weapon,4 
34567,Jarhead,4 
34567,Apocolypse Now,5 
34567,Platoon,5 
34567,Frozen,5 
+0

非常感謝..這似乎解決它:) – Maddy

+0

不客氣!樂意效勞 :) – JamCon

相關問題