hadoop mapreduce中單獨的輸出文件

我的問題可能已經被問到了，但我無法找到明確的答案來解決我的問題。hadoop mapreduce中單獨的輸出文件

我的MapReduce是一個基本的WordCount。我當前的輸出文件是：

// filename : 'part-r-00000' 
789 a 
755 #c 
456 d 
123 #b

如何更改輸出文件名？

然後是，有可能有2個輸出文件：

// First output file 
789 a 
456 d 

// Second output file 
123 #b 
755 #c

這裏是我的降低類：

public static class SortReducer extends Reducer<IntWritable, Text, IntWritable, Text> { 

    public void reduce(IntWritable key, Text value, Context context) throws IOException, InterruptedException { 

     context.write(key, value); 

    } 
}

這裏是我的Partitionner類：

public class TweetPartitionner extends Partitioner<Text, IntWritable>{ 

    @Override 
    public int getPartition(Text a_key, IntWritable a_value, int a_nbPartitions) { 
     if(a_key.toString().startsWith("#")) 
      return 1; 
     return 0; 
    } 


}

非常感謝！

來源

2013-06-25 Apaachee

在你的工作文件中設置

job.setNumReduceTasks(2);

從映射器發出

a 789 
#c 755  
d 456 
#b 123

寫一個分區，添加分區到工作配置，在分區檢查，如果有＃返回1，否則0

鍵啓動

在減速器交換鍵和值

來源

2013-06-25 10:55:34 banjara

非常感謝zuxqoj，它似乎是一個很好的解決方案。所以我用我的Partitionner更新了我的帖子。但是，當我運行程序，我有一個錯誤：'java.io.IOException：#rescinfo（1）非法分區'爲什麼？ – Apaachee

我開始解決方案：http://stackoverflow.com/questions/12928101/hadoop-number-of-reducer-is-not-equal-to-what-i-have-set-in-program – Apaachee

Eclipse can只能啓動一個減速器。我的Hadoop安裝位於我的機器上的cygwin上。我如何使用我的安裝來完成其他減速器？ – Apaachee

到你的其他問題如何c更改輸出文件名，你可以看看http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(java.lang.String，K，V）。

來源

2013-06-25 11:13:24

hadoop mapreduce中單獨的輸出文件

回答

相關問題