bigdata hadoop java codefor wordcount modified

我必須修改hadoop wordcount示例，計算以前綴「cons」開頭的單詞數量，然後按照頻率的降序對結果進行排序。任何人都可以告訴如何爲此編寫mapper和reducer代碼嗎？bigdata hadoop java codefor wordcount modified

代碼：

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> 
{ 
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
    { 
     //Replacing all digits and punctuation with an empty string 
     String line = value.toString().replaceAll("\\p{Punct}|\\d", "").toLowerCase(); 
     //Extracting the words 
     StringTokenizer record = new StringTokenizer(line); 
     //Emitting each word as a key and one as itsvalue 
     while (record.hasMoreTokens()) 
      context.write(new Text(record.nextToken()), new IntWritable(1)); 
    } 
}

來源

2014-10-02 blackbookstar

公共類WordCountMapper延伸映射器 { 公共無效地圖（LongWritable鍵，文字值，上下文上下文）拋出IOException，InterruptedException //用空字符串替換所有數字和標點符號 \t String line = value.toString（）。replaceAll（「\\ p {Punct} | \\ d」，「」）.toLowerCase（）; //提取單詞 \t StringTokenizer record = new StringTokenizer（line）; （新的文本（record.nextToken（）），新的IntWritable（1））;或者， } } – blackbookstar 2014-10-02 23:17:55

在這段代碼中需要修改代碼以計算以「cons」開頭的字數 – blackbookstar 2014-10-02 23:18:54

以下是我爲hadoop wordcount代碼提供的鏈接。 http://wiki.apache.org/hadoop/WordCount – blackbookstar 2014-10-02 23:23:55

要數以「利弊」開頭的單詞數，你可以拋棄一切換句話說，同時從映射器發射。

public void map(Object key, Text value, Context context) throws IOException, 
     InterruptedException { 
    IntWritable one = new IntWritable(1); 
    String[] words = value.toString().split(" "); 
    for (String word : words) { 
     if (word.startsWith("cons")) 
       context.write(new Text("cons_count"), one); 
    } 
}

減速機現在只接收一鍵= cons_count，你可以總結的值來獲得計數。

若要根據頻率對以「cons」開頭的單詞進行排序，以cons開頭的單詞應該與同一個reducer進行排序，並且reducer應該總結並排序。爲了做到這一點，

public class MyMapper extends Mapper<Object, Text, Text, Text> { 


@Override 
public void map(Object key, Text value, Context output) throws IOException, 
     InterruptedException { 
     String[] words = value.toString().split(" "); 
     for (String word : words) { 
     if (word.startsWith("cons")) 
       context.write(new Text("cons"), new Text(word)); 
    } 
} 
}

減速機：

public class MyReducer extends Reducer<Text, Text, Text, IntWritable> { 

@Override 
public void reduce(Text key, Iterable<Text> values, Context output) 
     throws IOException, InterruptedException { 
    Map<String,Integer> wordCountMap = new HashMap<String,Integer>(); 
    for(Text value: values){ 
     word = value.get(); 
     if (wordCountMap.contains(word) { 
      Integer count = wordCountMap.get(key); 
      count++; 
      wordCountMap.put(word,count) 
     }else { 
     wordCountMap.put(word,new Integer(1)); 
     } 
    } 

    //use some sorting mechanism to sort the map based on values. 
    // ... 

    for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) { 
     context.write(new Word(entry.getKey(),new IntWritable(entry.getValue()); 
    } 
}

}

來源

2014-10-03 05:49:56

進行一些修改，第二個映射代碼是我們需要的正確代碼。除去以「cons」開頭的所有其他詞。 hadoop按鍵對中間鍵值對進行排序，輸出按升序排序。在這裏，我們必須按照以cons開頭的詞的降序編寫我們的自定義排序比較器。 – blackbookstar 2014-10-03 21:42:08

可以請你把整個代碼。 – blackbookstar 2014-10-03 21:45:03

@blackbookstar通過整個代碼你的意思是排序？檢查此鏈接如何做到這一點：http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java – 2014-10-05 09:45:15

bigdata hadoop java codefor wordcount modified

回答

相關問題