0
我不知道如何使用MultipleOutputs類。我用它來創建多個輸出文件。以下是我的Driver類的代碼片段使用MultipleOutputs沒有context.write結果空文件
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(CustomKeyValueTest.class);//class with mapper and reducer
job.setOutputKeyClass(CustomKey.class);
job.setOutputValueClass(Text.class);
job.setMapOutputKeyClass(CustomKey.class);
job.setMapOutputValueClass(CustomValue.class);
job.setMapperClass(CustomKeyValueTestMapper.class);
job.setReducerClass(CustomKeyValueTestReducer.class);
job.setInputFormatClass(TextInputFormat.class);
Path in = new Path(args[1]);
Path out = new Path(args[2]);
out.getFileSystem(conf).delete(out, true);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
MultipleOutputs.addNamedOutput(job, "islnd" , TextOutputFormat.class, CustomKey.class, Text.class);
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
MultipleOutputs.setCountersEnabled(job, true);
boolean status = job.waitForCompletion(true);
和減速,我用這樣的MultipleOutputs,
private MultipleOutputs<CustomKey, Text> multipleOutputs;
@Override
public void setup(Context context) throws IOException, InterruptedException {
multipleOutputs = new MultipleOutputs<>(context);
}
@Override
public void reduce(CustomKey key, Iterable<CustomValue> values, Context context) throws IOException, InterruptedException {
...
multipleOutputs.write("islnd", key, pop, key.toString());
//context.write(key, pop);
}
public void cleanup() throws IOException, InterruptedException {
multipleOutputs.close();
}
}
當我使用context.write我得到的輸出文件與它的數據。但是,當我刪除context.write輸出文件是空的。但我不想調用context.write,因爲它會創建額外的文件part-r-00000。正如here(類的描述中的最後一個段落),我使用LazyOutputFormat來避免part-r-00000文件。但仍然沒有工作。