我正在學習一些MapReduce,但我遇到了一些問題,這裏的情況: 我有兩個文件: 「用戶」包含用戶列表中的一些他們的數據(性別,年齡,國家,等等)的文件看起來像這樣:hadoop-mapreduce reducer-組合器輸入
user_000003 m 22 United States Oct 30, 2005
「詩經」中包含的所有用戶(用戶ID,日期和收聽一次,藝術家ID,藝術家聽了歌曲數據名,歌曲ID,歌曲名稱):
user_000999 2008-12-11T22:52:33Z b7ffd2af-418f-4be2-bdd1-22f8b48613da Nine Inch Nails 1d1bb32a-5bc6-4b6f-88cc-c043f6c52509 7 Ghosts I
的目標是找到k個最裏在某些國家吟唱歌曲。與k和輸入提供的國家列表。
我決定爲映射器使用MultipleInputs類,因此一個映射器將輸出一組鍵值對,如下所示:。另一個映射器將輸出。據我所知,我應該能夠讀取與某個鍵配對的所有值(所以我應該在縮減器中查找與用戶ID關聯的值列表中的國家和一定數量的歌曲),並輸出一組將文件與其他MapReduce作業配對爲紅色。
我很確定映射器能夠完成他們的工作,因爲我能夠使用reducer編寫他們的輸出。
UPDATE:將文件傳遞到下面的代碼映射器:
Job job = Job.getInstance(conf);
MultipleInputs.addInputPath(job, new Path(songsFile), TextInputFormat.class, SongMapper.class);
MultipleInputs.addInputPath(job, new Path(usersFile), TextInputFormat.class, UserMapper.class);
FileOutputFormat.setOutputPath(job, new Path(outFile));
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setJarByClass(Songs.class);
job.setCombinerClass(Combiner.class);
job.setReducerClass(UserSongReducer.class);
映射器代碼:
public static class UserMapper extends Mapper<LongWritable, Text, Text, Text>{
//empty cleanUp()
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String record = value.toString();
String[] userData = record.split("\t");
if(userData.length>3 && !userData[3].equals(""))
{
context.write(new Text(userData[0]), new Text(userData[3]));
}
}
//empty run() and setup()
}
public static class SongMapper extends Mapper<LongWritable, Text, Text, Text>{
//empty cleanUp()
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String record = value.toString();
String[] songData = record.split("\t");
if(songData.length>3 && !songData[3].equals(""))
{
context.write(new Text(songData[0]), new Text(songData[5]+" ||| "+songData[3]));
}
}
//empty run() and setup()
}
合代碼:
public static class Combiner extends Reducer<Text, Text, Text, Text>
{
private boolean isCountryAllowed(String toCheck, String[] countries)
{
for(int i=0; i<countries.length;i++)
{
if(toCheck.equals(countries[i]))
return true;
}
return false;
}
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException
{
ArrayList<String> list = new ArrayList<String>();
String country = "foo";
for(Text value : values)
{
if(!value.toString().contains(" ||| "))
{
country = value.toString();
}else
{
list.add(value.toString());
}
}
if(isCountryAllowed(country, context.getConfiguration().getStrings("countries")))
{
for (String listVal : list)
{
context.write(new Text(country),
new Text(listVal));
}
}
}
}
的問題是當我嘗試用減速器輸出配對:
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException
{
for (Text value : values)
{
context.write(key,value);
}
}
}
我用「||| 「建立藝術家+標題字符串 問題是國家仍然是」富「,我認爲我應該看到至少有一行輸出與正確的國家作爲關鍵,但輸出它總是」富「(2,5kb歌曲文件):
foo Deep Dish ||| Fuck Me Im Famous (Pacha Ibiza)-09-28-2007
foo Vnv Nation ||| Kingdom
foo Les Fleur De Lys ||| Circles
foo Home Video ||| Penguin
foo Of Montreal ||| Will You Come And Fetch Me
foo Godspeed You! Black Emperor ||| Bbf3
foo Alarum ||| Sustained Connection
foo Sneaker Pimps ||| Walking Zero
foo Cecilio And Kapono ||| I Love You
foo Garbage ||| Vow
foo The Brian Setzer Orchestra ||| Gettin' In The Mood
foo Nitin Sawhney ||| Sunset (J-Walk Remix)
foo Nine Inch Nails ||| Heresy
foo Collective Soul ||| Crowded Head
foo Vicarious Bliss ||| Limousine
foo Noisettes ||| Malice In Wonderland
foo Black Rebel Motorcycle Club ||| Lien On Your Dreams
foo Mae ||| Brink Of Disaster
foo Michael Andrews ||| Rosie Darko
foo A Perfect Circle ||| Blue
我究竟做錯了
PS我應該能夠避免第二職業,如果我用一個自定義的組合,確實組合的行爲完全像一個減速
細節有關的映射器丟失。請提供他們,因爲那裏可能已經有錯。 – DDW
請注意,由於組合器永遠不能保證被調用,它不能解決任何問題。 – DDW
請檢查我更新的問題。 –