在Hadoop中使用CombineFileInputFormat

我正在嘗試編寫一個Map Reduce程序，其中包含大約1000個小文件（大小爲幾MB），作爲輸入。據我所知，這將導致大約1000個映射器任務被創建（HDFS塊大小默認爲64MB）。因此，在這種情況下，使用CombineFileInputFormat將比TextInputFormat更高效。我對麼？在Hadoop中使用CombineFileInputFormat

如果是這樣，如何在我的程序中使用CombineFileInputFormat？

來源

2014-02-13 user1142353

Hadoop api尚未提供CombineFileInputFormat的完全具體實現。我自己實現了一些。看看：https://github.com/thomachan/Custom-MR/tree/master/src/mapreduce/hi/api/input/defaultcustom

來源

2014-02-14 05:28:06

在Hadoop中使用CombineFileInputFormat

回答

相關問題