2017-04-07 59 views
0

我不知道如何使用MultipleOutputs類。我用它來創建多個輸出文件。以下是我的Driver類的代碼片段使用MultipleOutputs沒有context.write結果空文件

Configuration conf = new Configuration(); 

    Job job = Job.getInstance(conf); 
    job.setJarByClass(CustomKeyValueTest.class);//class with mapper and reducer 
    job.setOutputKeyClass(CustomKey.class); 
    job.setOutputValueClass(Text.class); 
    job.setMapOutputKeyClass(CustomKey.class); 
    job.setMapOutputValueClass(CustomValue.class); 
    job.setMapperClass(CustomKeyValueTestMapper.class); 
    job.setReducerClass(CustomKeyValueTestReducer.class); 
    job.setInputFormatClass(TextInputFormat.class); 

    Path in = new Path(args[1]); 
    Path out = new Path(args[2]); 
    out.getFileSystem(conf).delete(out, true); 

    FileInputFormat.setInputPaths(job, in); 
    FileOutputFormat.setOutputPath(job, out); 

    MultipleOutputs.addNamedOutput(job, "islnd" , TextOutputFormat.class, CustomKey.class, Text.class); 
    LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); 
    MultipleOutputs.setCountersEnabled(job, true); 

    boolean status = job.waitForCompletion(true); 

和減速,我用這樣的MultipleOutputs,

private MultipleOutputs<CustomKey, Text> multipleOutputs; 

@Override 
public void setup(Context context) throws IOException, InterruptedException { 
    multipleOutputs = new MultipleOutputs<>(context); 
} 

@Override 
public void reduce(CustomKey key, Iterable<CustomValue> values, Context context) throws IOException, InterruptedException { 
    ... 
    multipleOutputs.write("islnd", key, pop, key.toString()); 
    //context.write(key, pop); 

} 

public void cleanup() throws IOException, InterruptedException { 
    multipleOutputs.close(); 
} 

}

當我使用context.write我得到的輸出文件與它的數據。但是,當我刪除context.write輸出文件是空的。但我不想調用context.write,因爲它會創建額外的文件part-r-00000。正如here(類的描述中的最後一個段落),我使用LazyOutputFormat來避免part-r-00000文件。但仍然沒有工作。

回答

0

LazyOutputFormat.setOutputFormatClass(job,TextOutputFormat.class);

這意味着,如果你沒有創建任何輸出,不要創建空文件。

Can you please look at hadoop counters and find 
1. map.output.records 
2. reduce.input.groups 
3. reduce.input.records to verify if your mappers are sending any data to mapper. 

代碼與它多輸出是 http://bytepadding.com/big-data/map-reduce/multipleoutputs-in-map-reduce/