我必須修改hadoop wordcount示例,計算以前綴「cons」開頭的單詞數量,然後按照頻率的降序對結果進行排序。任何人都可以告訴如何爲此編寫mapper和reducer代碼嗎?bigdata hadoop java codefor wordcount modified
代碼:
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
//Replacing all digits and punctuation with an empty string
String line = value.toString().replaceAll("\\p{Punct}|\\d", "").toLowerCase();
//Extracting the words
StringTokenizer record = new StringTokenizer(line);
//Emitting each word as a key and one as itsvalue
while (record.hasMoreTokens())
context.write(new Text(record.nextToken()), new IntWritable(1));
}
}
公共類WordCountMapper延伸 映射器 { 公共無效地圖(LongWritable鍵,文字值,上下文上下文) 拋出IOException,InterruptedException //用空字符串替換所有數字和標點符號 \t String line = value.toString()。replaceAll(「\\ p {Punct} | \\ d」,「」).toLowerCase(); //提取單詞 \t StringTokenizer record = new StringTokenizer(line); (新的文本(record.nextToken()),新的IntWritable(1));或者, } } –
blackbookstar
2014-10-02 23:17:55
在這段代碼中需要修改代碼以計算以「cons」開頭的字數 – blackbookstar 2014-10-02 23:18:54
以下是我爲hadoop wordcount代碼提供的鏈接。 http://wiki.apache.org/hadoop/WordCount – blackbookstar 2014-10-02 23:23:55