使用MapReduce，如何修改以下字數計數碼，使其僅輸出高於某個計數閾值的字？（例如，我想添加某種鍵值對的過濾。）MapReduce：如果值不在閾值以上，則篩選出鍵值對

輸入：

ant bee cat 
bee cat dog 
cat dog

輸出：讓說計數閾值是2個或更多

cat 3 
dog 2

繼代碼是：http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Source+Code

public static class Map1 extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 
    private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 
    String line = value.toString(); 
    StringTokenizer tokenizer = new StringTokenizer(line); 
    while (tokenizer.hasMoreTokens()) { 
     word.set(tokenizer.nextToken()); 
     output.collect(word, one); 
    } 
    } 
} 

public static class Reduce1 extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 
    public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 
    int sum = 0; 
    while (values.hasNext()) { 
     sum += values.next().get(); 
    } 
    output.collect(key, new IntWritable(sum)); 
    } 
}

編輯：RE：約輸入/測試用例

輸入文件（「example.dat」）和一個簡單的測試的情況下（「測試用例」）被在這裏找到：https://github.com/csiu/tokens/tree/master/other/SO-26695749

編輯：

問題不是代碼。這是由於org.apache.hadoop.mapred包之間的一些奇怪行爲造成的。（Is it better to use the mapred or the mapreduce package to create a Hadoop Job?）。

點：使用if語句代替`org.apache.hadoop.mapreduce`

來源

2014-11-02 csiu

嘗試增加的收集輸出降低了。

if(sum >= 2) 
    output.collect(key, new IntWritable(sum));

來源

2014-11-02 03:34:23 irrelephant

當我做這樣的事情，我錯過了大約一半我的預期產出。 Reducer不收集/發出鍵值對是否合理？ – csiu 2014-11-02 03:43:04

不，這不應該發生。你能否在這個問題上發表更多細節？ – irrelephant 2014-11-02 03:45:03

當我嘗試了你的建議（在實際輸入'example.dat' - 請參閱上面的鏈接）時，我預計單詞「0」的計數爲594。但是，當我將閾值設置爲590時，沒有返回此值的計數。 – csiu 2014-11-02 04:23:30

你可以做過濾在降低1類：

if (sum>=2) { 
    output.collect(key. new IntWritable(sum)); 
}

來源

2014-11-02 03:34:55

當我做這樣的事情時，我大概錯過了我預期產出的一半。 Reducer不收集/發出鍵值對是否合理？ – csiu 2014-11-02 03:43:45

你可以顯示一些導致這個問題的輸入行嗎？ – 2014-11-02 03:55:34

問題是我在做例如檢查時發現的。字「0」 - 我預計計數爲594，但計數在設置590的閾值時未返回。 – csiu 2014-11-02 04:21:52

MapReduce：如果值不在閾值以上，則篩選出鍵值對

點：使用if語句代替org.apache.hadoop.mapreduce

回答

相關問題

點：使用if語句代替`org.apache.hadoop.mapreduce`