hadoop-mapreduce reducer-組合器輸入

我正在學習一些MapReduce，但我遇到了一些問題，這裏的情況：我有兩個文件：「用戶」包含用戶列表中的一些他們的數據（性別，年齡，國家，等等）的文件看起來像這樣：hadoop-mapreduce reducer-組合器輸入

user_000003 m 22 United States Oct 30, 2005

「詩經」中包含的所有用戶（用戶ID，日期和收聽一次，藝術家ID，藝術家聽了歌曲數據名，歌曲ID，歌曲名稱）：

user_000999 2008-12-11T22:52:33Z b7ffd2af-418f-4be2-bdd1-22f8b48613da Nine Inch Nails 1d1bb32a-5bc6-4b6f-88cc-c043f6c52509 7 Ghosts I

的目標是找到k個最裏在某些國家吟唱歌曲。與k和輸入提供的國家列表。

我決定爲映射器使用MultipleInputs類，因此一個映射器將輸出一組鍵值對，如下所示：。另一個映射器將輸出。據我所知，我應該能夠讀取與某個鍵配對的所有值（所以我應該在縮減器中查找與用戶ID關聯的值列表中的國家和一定數量的歌曲），並輸出一組將文件與其他MapReduce作業配對爲紅色。

我很確定映射器能夠完成他們的工作，因爲我能夠使用reducer編寫他們的輸出。

UPDATE：將文件傳遞到下面的代碼映射器：

Job job = Job.getInstance(conf); 

     MultipleInputs.addInputPath(job, new Path(songsFile), TextInputFormat.class, SongMapper.class); 
     MultipleInputs.addInputPath(job, new Path(usersFile), TextInputFormat.class, UserMapper.class); 

     FileOutputFormat.setOutputPath(job, new Path(outFile)); 
     job.setMapOutputKeyClass(Text.class); 
     job.setMapOutputValueClass(Text.class); 

     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(Text.class); 

     job.setJarByClass(Songs.class); 

     job.setCombinerClass(Combiner.class); 
     job.setReducerClass(UserSongReducer.class);

映射器代碼：

public static class UserMapper extends Mapper<LongWritable, Text, Text, Text>{ 

     //empty cleanUp() 

     protected void map(LongWritable key, Text value, Context context) 
       throws IOException, InterruptedException { 

      String record = value.toString();    
      String[] userData = record.split("\t"); 
      if(userData.length>3 && !userData[3].equals("")) 
      { 
       context.write(new Text(userData[0]), new Text(userData[3])); 
      }   


     } 

     //empty run() and setup() 

    } 

    public static class SongMapper extends Mapper<LongWritable, Text, Text, Text>{ 

     //empty cleanUp() 

     protected void map(LongWritable key, Text value, Context context) 
       throws IOException, InterruptedException { 

      String record = value.toString();    
      String[] songData = record.split("\t"); 
      if(songData.length>3 && !songData[3].equals("")) 
      { 
       context.write(new Text(songData[0]), new Text(songData[5]+" ||| "+songData[3])); 
      }   


     } 

     //empty run() and setup() 

    }

合代碼：

public static class Combiner extends Reducer<Text, Text, Text, Text> 
    { 

     private boolean isCountryAllowed(String toCheck, String[] countries) 
     { 

      for(int i=0; i<countries.length;i++) 
      { 
       if(toCheck.equals(countries[i])) 
        return true; 
      } 
      return false; 
     } 

     public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException 
     { 



      ArrayList<String> list = new ArrayList<String>(); 



      String country = "foo"; 
      for(Text value : values) 
      { 
       if(!value.toString().contains(" ||| ")) 
       { 
        country = value.toString(); 
       }else 
       { 
        list.add(value.toString()); 
       } 

      } 

      if(isCountryAllowed(country, context.getConfiguration().getStrings("countries"))) 
      { 

       for (String listVal : list) 
       { 
        context.write(new Text(country), 
          new Text(listVal)); 
       } 
      } 

     } 
    }

的問題是當我嘗試用減速器輸出配對：

public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException 
     { 


      for (Text value : values) 
      { 
       context.write(key,value); 
      } 
      } 

     }

我用「||| 「建立藝術家+標題字符串問題是國家仍然是」富「，我認爲我應該看到至少有一行輸出與正確的國家作爲關鍵，但輸出它總是」富「（2,5kb歌曲文件）：

foo Deep Dish ||| Fuck Me Im Famous (Pacha Ibiza)-09-28-2007 
foo Vnv Nation ||| Kingdom 
foo Les Fleur De Lys ||| Circles 
foo Home Video ||| Penguin 
foo Of Montreal ||| Will You Come And Fetch Me 
foo Godspeed You! Black Emperor ||| Bbf3 
foo Alarum ||| Sustained Connection 
foo Sneaker Pimps ||| Walking Zero 
foo Cecilio And Kapono ||| I Love You 
foo Garbage ||| Vow 
foo The Brian Setzer Orchestra ||| Gettin' In The Mood 
foo Nitin Sawhney ||| Sunset (J-Walk Remix) 
foo Nine Inch Nails ||| Heresy 
foo Collective Soul ||| Crowded Head 
foo Vicarious Bliss ||| Limousine 
foo Noisettes ||| Malice In Wonderland 
foo Black Rebel Motorcycle Club ||| Lien On Your Dreams 
foo Mae ||| Brink Of Disaster 
foo Michael Andrews ||| Rosie Darko 
foo A Perfect Circle ||| Blue

我究竟做錯了

PS我應該能夠避免第二職業，如果我用一個自定義的組合，確實組合的行爲完全像一個減速

來源

2014-05-23 Jacopo Grassi

細節有關的映射器丟失。請提供他們，因爲那裏可能已經有錯。 – DDW

請注意，由於組合器永遠不能保證被調用，它不能解決任何問題。 – DDW

請檢查我更新的問題。 –

？

從你的代碼我看，國家應該一直是「富」，你到底想要達到什麼目的？

，這樣你就輸出國家變量：

context.write(new Text(country),

來源

2014-06-01 12:30:53

hadoop-mapreduce reducer-組合器輸入

回答

相關問題