我遇到一些非常奇怪的事情。我在不同的減速器中獲得相同的鑰匙。我只是打印並收集了關鍵和值。我的reducer代碼如下所示。同樣的鑰匙在不同的減速器進來hadoop
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
System.out.println("The key is "+ key.toString());
while(values.hasNext()){
Text value=values.next();
key.set("");
output.collect(key, value);
}
}
在控制檯上的輸出是
The key is 111-00-1234195967001
The key is 1234529857009
The key is 1234529857009
14/01/06 20:11:16 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
14/01/06 20:11:16 INFO mapred.LocalJobRunner:
14/01/06 20:11:16 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
14/01/06 20:11:16 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:54310/user/hduser/joboutput11
14/01/06 20:11:18 INFO mapred.LocalJobRunner: reduce > reduce
14/01/06 20:11:18 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
14/01/06 20:11:19 INFO mapred.JobClient: map 100% reduce 100%
14/01/06 20:11:19 INFO mapred.JobClient: Job complete: job_local_0001
14/01/06 20:11:19 INFO mapred.JobClient: Counters: 23
14/01/06 20:11:19 INFO mapred.JobClient: File Input Format Counters
14/01/06 20:11:19 INFO mapred.JobClient: Bytes Read=289074
14/01/06 20:11:19 INFO mapred.JobClient: File Output Format Counters
14/01/06 20:11:19 INFO mapred.JobClient: Bytes Written=5707
14/01/06 20:11:19 INFO mapred.JobClient: FileSystemCounters
14/01/06 20:11:19 INFO mapred.JobClient: FILE_BYTES_READ=19185
14/01/06 20:11:19 INFO mapred.JobClient: HDFS_BYTES_READ=1254215
14/01/06 20:11:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=270933
14/01/06 20:11:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=5707
14/01/06 20:11:19 INFO mapred.JobClient: Map-Reduce Framework
14/01/06 20:11:19 INFO mapred.JobClient: Map output materialized bytes=5633
14/01/06 20:11:19 INFO mapred.JobClient: Map input records=5
14/01/06 20:11:19 INFO mapred.JobClient: Reduce shuffle bytes=0
14/01/06 20:11:19 INFO mapred.JobClient: Spilled Records=10
14/01/06 20:11:19 INFO mapred.JobClient: Map output bytes=5583
14/01/06 20:11:19 INFO mapred.JobClient: Total committed heap usage (bytes)=991539200
14/01/06 20:11:19 INFO mapred.JobClient: CPU time spent (ms)=0
14/01/06 20:11:19 INFO mapred.JobClient: Map input bytes=289074
14/01/06 20:11:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=627
14/01/06 20:11:19 INFO mapred.JobClient: Combine input records=0
14/01/06 20:11:19 INFO mapred.JobClient: Reduce input records=5
14/01/06 20:11:19 INFO mapred.JobClient: Reduce input groups=3
14/01/06 20:11:19 INFO mapred.JobClient: Combine output records=0
14/01/06 20:11:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
14/01/06 20:11:19 INFO mapred.JobClient: Reduce output records=7
14/01/06 20:11:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
14/01/06 20:11:19 INFO mapred.JobClient: Map output records=5
關鍵1234529857009重複兩次這是不正常的。任何想法爲什麼發生這種情況。
感謝
您可以檢查值並告訴我們每個鍵提供了多少個值以及它們有多少個不同? – Mehraban
謝謝。有兩個不同的密鑰,即111-00-1234195967001和1234529857009.第一個產生2個值,第二個密鑰提供3個值。但是,這三者是分開的,兩個值分別來自一個還原器和第三個還原器。現在simplefish說這是一個正常的行爲,這又是一個問題。我在simplefish回覆評論中解釋了它爲我創造的問題。我正在使用單個節點。 – shujaat