0
集分析我有一個電影數據庫中的以下數據集:電影數據使用PIG
評分:用戶ID,MovieID,評級::電影:MovieID,標題::用戶:用戶ID,性別,年齡
現在,我必須加入上述3個數據集,並確定哪部電影在女性中評分最高,男性中評分最低,反之亦然。 我也做了JOIN:
myusers = LOAD '/user/cloudera/movies/input/users.dat'
USING PigStorage(':')
AS (user:int, n1, gender:chararray, n2, age:int);
ratings = LOAD '/user/cloudera/movies/input/ratings.dat'
USING PigStorage(':')
AS (user:int, n1, movie:int, n2, rating:int);
movies = LOAD '/user/cloudera/movies/input/movies.dat'
USING PigStorage(':')
AS (movie:int,n1,title:chararray);
data = JOIN ratings BY user, myusers BY user;
data2= JOIN data BY ratings::movie, movies BY movie;
但畢竟這我遇到了許多問題,如「ERROR 0:標有在輸出多行」,當我嘗試從數據2打印列。任何想法來幫助我完成這項任務?