在this文章中,我發現這個詞映射碼數:何時應該在Hadoop中使用OutputCollector和Context?
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}
相反,在official tutorial這是所提供的映射:
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
到現在爲止,我只Context
來看到寫一些從映射器到還原器,我從來沒有見過(或使用過)OutputCollector
。我已閱讀documentation,但我不明白其使用的關鍵或爲什麼我應該使用它。
我完全沒有看到這與問題有關。 – justHelloWorld