2014-10-02 55 views
0

我必須修改hadoop wordcount示例,計算以前綴「cons」開頭的單詞數量,然後按照頻率的降序對結果進行排序。任何人都可以告訴如何爲此編寫mapper和reducer代碼嗎?bigdata hadoop java codefor wordcount modified

代碼:

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> 
{ 
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
    { 
     //Replacing all digits and punctuation with an empty string 
     String line = value.toString().replaceAll("\\p{Punct}|\\d", "").toLowerCase(); 
     //Extracting the words 
     StringTokenizer record = new StringTokenizer(line); 
     //Emitting each word as a key and one as itsvalue 
     while (record.hasMoreTokens()) 
      context.write(new Text(record.nextToken()), new IntWritable(1)); 
    } 
} 
+0

公共類WordCountMapper延伸 映射器 { 公共無效地圖(LongWritable鍵,文字值,上下文上下文) 拋出IOException,InterruptedException //用空字符串替換所有數字和標點符號 \t String line = value.toString()。replaceAll(「\\ p {Punct} | \\ d」,「」).toLowerCase(); //提取單詞 \t StringTokenizer record = new StringTokenizer(line); (新的文本(record.nextToken()),新的IntWritable(1));或者, } } – blackbookstar 2014-10-02 23:17:55

+0

在這段代碼中需要修改代碼以計算以「cons」開頭的字數 – blackbookstar 2014-10-02 23:18:54

+0

以下是我爲hadoop wordcount代碼提供的鏈接。 http://wiki.apache.org/hadoop/WordCount – blackbookstar 2014-10-02 23:23:55

回答

0

要數以「利弊」開頭的單詞數,你可以拋棄一切換句話說,同時從映射器發射。

public void map(Object key, Text value, Context context) throws IOException, 
     InterruptedException { 
    IntWritable one = new IntWritable(1); 
    String[] words = value.toString().split(" "); 
    for (String word : words) { 
     if (word.startsWith("cons")) 
       context.write(new Text("cons_count"), one); 
    } 
} 

減速機現在只接收一鍵= cons_count,你可以總結的值來獲得計數。

若要根據頻率對以「cons」開頭的單詞進行排序,以cons開頭的單詞應該與同一個reducer進行排序,並且reducer應該總結並排序。爲了做到這一點,

public class MyMapper extends Mapper<Object, Text, Text, Text> { 


@Override 
public void map(Object key, Text value, Context output) throws IOException, 
     InterruptedException { 
     String[] words = value.toString().split(" "); 
     for (String word : words) { 
     if (word.startsWith("cons")) 
       context.write(new Text("cons"), new Text(word)); 
    } 
} 
} 

減速機:

public class MyReducer extends Reducer<Text, Text, Text, IntWritable> { 

@Override 
public void reduce(Text key, Iterable<Text> values, Context output) 
     throws IOException, InterruptedException { 
    Map<String,Integer> wordCountMap = new HashMap<String,Integer>(); 
    for(Text value: values){ 
     word = value.get(); 
     if (wordCountMap.contains(word) { 
      Integer count = wordCountMap.get(key); 
      count++; 
      wordCountMap.put(word,count) 
     }else { 
     wordCountMap.put(word,new Integer(1)); 
     } 
    } 

    //use some sorting mechanism to sort the map based on values. 
    // ... 

    for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) { 
     context.write(new Word(entry.getKey(),new IntWritable(entry.getValue()); 
    } 
} 

}

+0

進行一些修改,第二個映射代碼是我們需要的正確代碼。除去以「cons」開頭的所有其他詞。 hadoop按鍵對中間鍵值對進行排序,輸出按升序排序。在這裏,我們必須按照以cons開頭的詞的降序編寫我們的自定義排序比較器。 – blackbookstar 2014-10-03 21:42:08

+0

可以請你把整個代碼。 – blackbookstar 2014-10-03 21:45:03

+0

@blackbookstar通過整個代碼你的意思是排序?檢查此鏈接如何做到這一點:http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java – 2014-10-05 09:45:15