Mapper類如何將SequenceFile標識爲hadoop中的輸入文件？

在我的一個MapReduce任務中，我將BytesWritable重寫爲KeyBytesWritable，並將ByteWritable重寫爲ValueBytesWritable。然後我使用SequenceFileOutputFormat輸出結果。Mapper類如何將SequenceFile標識爲hadoop中的輸入文件？

我的問題是當我開始下一個MapReduce任務時，我想使用這個SequenceFile作爲輸入文件。那麼我怎樣才能設置工作類，以及Mapper類如何識別我之前被覆蓋的SequenceFile中的鍵和值？

我明白我可以通過SequenceFile.Reader來讀取鍵值。

Configuration config = new Configuration(); 
Path path = new Path(PATH_TO_YOUR_FILE); 
SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config); 
WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance(); 
Writable value = (Writable) reader.getValueClass().newInstance(); 
while (reader.next(key, value))

但我不知道如何使用此讀取器將密鑰和值作爲參數傳遞給Mapper類。我怎樣才能將Conf.setInputFormat設置爲SequenceFileInputFormat，然後讓Mapper獲得關鍵字和值？

謝謝

來源

2013-03-02 JoJo

您不需要手動讀取序列文件。只需設置輸入格式類序列文件：

job.setInputFormatClass(SequenceFileInputFormat.class);

，並輸入路徑設置爲包含侑序列文件的目錄。

FileInputFormat.setInputPaths(<path to the dir containing your sequence files>);

您需要注意（鍵，值）類型的參數化類型的映射類的輸入匹配（鍵，值）的序列文件中的元組。

來源

2013-03-02 23:14:55 javadba

我正在嘗試根據您的建議進行設置。 – JoJo 2013-03-05 18:43:16

Mapper類如何將SequenceFile標識爲hadoop中的輸入文件？

回答

相關問題