爲Hadoop創建序列文件格式MR

我正在與Hadoop MapRedue一起工作，並有一個問題。目前，我的映射器的input KV type是LongWritable, LongWritable type和 output KV type也是LongWritable, LongWritable type。 InputFileFormat是SequenceFileInputFormat。基本上我想要做的是將一個txt文件轉換爲SequenceFileFormat，以便我可以將其用於我的映射器。爲Hadoop創建序列文件格式MR

我想什麼做的是

輸入文件是這樣的

1\t2 (key = 1, value = 2)

2\t3 (key = 2, value = 3)

和和...

我看着這個線程How to convert .txt file to Hadoop's sequence file format但reliazing，TextInputFormat只支持Key = LongWritable and Value = Text

有沒有什麼方法可以獲得txt並在KV = LongWritable, LongWritable中創建序列文件？

來源

2012-09-03 user1566629

當然，基本上和我在你鏈接的其他線索中講過的方式一樣。但是你必須實現你自己的Mapper。

只是一個快速從頭給你：

public class LongLongMapper extends 
    Mapper<LongWritable, Text, LongWritable, LongWritable> { 

    @Override 
    protected void map(LongWritable key, Text value, 
     Mapper<LongWritable, Text, LongWritable, LongWritable>.Context context) 
     throws IOException, InterruptedException { 

    // assuming that your line contains key and value separated by \t 
    String[] split = value.toString().split("\t"); 

    context.write(new LongWritable(Long.valueOf(split[0])), new LongWritable(
     Long.valueOf(split[1]))); 

    } 

    public static void main(String[] args) throws IOException, 
     InterruptedException, ClassNotFoundException { 

    Configuration conf = new Configuration(); 
    Job job = new Job(conf); 
    job.setJobName("Convert Text"); 
    job.setJarByClass(LongLongMapper.class); 

    job.setMapperClass(Mapper.class); 
    job.setReducerClass(Reducer.class); 

    // increase if you need sorting or a special number of files 
    job.setNumReduceTasks(0); 

    job.setOutputKeyClass(LongWritable.class); 
    job.setOutputValueClass(LongWritable.class); 

    job.setOutputFormatClass(SequenceFileOutputFormat.class); 
    job.setInputFormatClass(TextInputFormat.class); 

    FileInputFormat.addInputPath(job, new Path("/input")); 
    FileOutputFormat.setOutputPath(job, new Path("/output")); 

    // submit and wait for completion 
    job.waitForCompletion(true); 
    } 
}

在您的映射函數的每個值將得到一個行你輸入的，所以我們只是通過你的分隔符（標籤）分裂，並分析它的每一部分進入多頭。

就是這樣。

來源

2012-09-03 17:45:42

謝謝你，從骨架中得到了很多想法，並能夠創建一個seq。文件編寫者。 – user1566629

如果你有另一個例子，請給我，所以我可以更好地理解它，電子郵件ID [email protected]，在此先感謝親愛的 –

並請告訴我什麼將是reducer類輸入輸出格式，我的意思是關鍵和價值用於輸入和輸出 –

爲Hadoop創建序列文件格式MR

回答

相關問題