當然,基本上和我在你鏈接的其他線索中講過的方式一樣。但是你必須實現你自己的Mapper
。
只是一個快速從頭給你:
public class LongLongMapper extends
Mapper<LongWritable, Text, LongWritable, LongWritable> {
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, LongWritable, LongWritable>.Context context)
throws IOException, InterruptedException {
// assuming that your line contains key and value separated by \t
String[] split = value.toString().split("\t");
context.write(new LongWritable(Long.valueOf(split[0])), new LongWritable(
Long.valueOf(split[1])));
}
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJobName("Convert Text");
job.setJarByClass(LongLongMapper.class);
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
// increase if you need sorting or a special number of files
job.setNumReduceTasks(0);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(LongWritable.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setInputFormatClass(TextInputFormat.class);
FileInputFormat.addInputPath(job, new Path("/input"));
FileOutputFormat.setOutputPath(job, new Path("/output"));
// submit and wait for completion
job.waitForCompletion(true);
}
}
在您的映射函數的每個值將得到一個行你輸入的,所以我們只是通過你的分隔符(標籤)分裂,並分析它的每一部分進入多頭。
就是這樣。
謝謝你,從骨架中得到了很多想法,並能夠創建一個seq。文件編寫者。 – user1566629
如果你有另一個例子,請給我,所以我可以更好地理解它,電子郵件ID [email protected],在此先感謝親愛的 –
並請告訴我什麼將是reducer類輸入輸出格式,我的意思是關鍵和價值用於輸入和輸出 –