hadoop中映射器和合並器的不同上下文類型

你好，我試圖實現java hadoop應用程序。我想製作<對象，文本，NaicsAreaPair，LongWritable >（所以mapper的輸出將是NaicsAreaPair作爲鍵和LongWritable作爲值）。然後，我需要組合器像< NaicsAreaPair，LongWritable，Text，AreaWagePair >所以輸入與映射器輸出是正確的，但組合器輸出不同於映射器輸出。hadoop中映射器和合並器的不同上下文類型

我在主類此配置：

public static void main(String[] args) throws Exception { 
Configuration conf = new Configuration(); 
Job job = Job.getInstance(conf, "NY statistics"); 
job.setJarByClass(NYStatisticsOwnWritableComparable.class); 
job.setMapperClass(TokenizerMapper.class); 
job.setCombinerClass(Combiner.class); 
job.setReducerClass(IntSumReducer.class); 
job.setOutputKeyClass(NaicsAreaPair.class); 
job.setOutputValueClass(LongWritable.class); 
//job.setPartitionerClass(Rozdelovac.class); 
FileInputFormat.addInputPath(job, new Path(args[0])); 
FileOutputFormat.setOutputPath(job, new Path(args[1])); 
//job.setNumReduceTasks(3); 
System.exit(job.waitForCompletion(true) ? 0 : 1); 
}

在這裏我要說的輸出鍵和產值將使用哪個。有沒有可能爲mapper使用這個輸出鍵和值來設置它，但對於組合器使用不同？

非常感謝您的回答

來源

2015-05-09 DaMi

事實並非如此。組合器輸出必須與映射器輸出相同。

來源

2015-05-09 19:49:27

爲什麼你想使用一個組合器呢？合併器的目的是通過減少通過網絡發送的數據來實現'性能'。，有若干限制，如輸入/輸出類型必須與映射器輸出（鍵/值）類型/減速器輸入（鍵/值）類型，即它執行應該締合和交換在這裏看到的示例http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/

函數匹配

什麼你想作爲你的組合器，讓它成爲減速器

來源

2015-05-09 20:13:49 bl3e

我用它，因爲我想在mapper naics作爲關鍵和區域與工資作爲價值。然後在Combiner中，我想將每個區域的所有工資總和，並把它作爲關鍵區域和工資總額作爲價值發送，然後在縮減器中，我只想找到每個區域中每個區域的最大值。如果我贏了不使用組合器爲此，我將需要使用減速器2次（一次用於求和，一次用於最大值）不是嗎？在這一點上，組合器會使數據變小一點（不是每個naic的區域數）嗎？ – DaMi

您可以使用組合器來執行聚合（僅此而已），因此輸出類型與其輸入類型相同。在縮減器中，您可以找到每個輸入鍵的最大類型。 – bl3e

hadoop中映射器和合並器的不同上下文類型

回答

相關問題