運行使用Hadoop0.20.2的標準「WordCount」程序時沒有得到正確的輸出

我是Hadoop的新手。我一直在嘗試運行着名的「WordCount」程序 - 該程序在列表中的字數爲使用Hadoop-0.20.2的文件。我正在使用單節點羣集。運行使用Hadoop0.20.2的標準「WordCount」程序時沒有得到正確的輸出

Follwing是我的程序：

import java.io.File; import java.io.IOException; import java.util。*;

import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf。 ; import org.apache.hadoop.io。; import org.apache.hadoop.mapreduce。*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

公共類字計數{

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 
    private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     String line = value.toString(); 
     StringTokenizer tokenizer = new StringTokenizer(line); 
     while (tokenizer.hasMoreTokens()) { 
      word.set(tokenizer.nextToken()); 
      context.write(word, one); 
     } 
    } 
} 

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { 

    public void reduce(Text key, Iterator<IntWritable> values, Context context) 
    throws IOException, InterruptedException { 
     int sum = 0; 
     while (values.hasNext()) { 
      ++sum ; 
     } 
     context.write(key, new IntWritable(sum)); 
    } 
} 

public static void main(String[] args) throws Exception { 
    Configuration conf = new Configuration(); 
    Job job = new Job(conf, "wordcount");   
    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job, new Path(args[1]));   
    job.setJarByClass(WordCount.class); 
    job.setInputFormatClass(TextInputFormat.class); 
    job.setOutputFormatClass(TextOutputFormat.class); 
    job.setMapperClass(Map.class); 
    job.setMapOutputKeyClass(Text.class); 
    job.setMapOutputValueClass(IntWritable.class);  

    job.setReducerClass(Reduce.class);   
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class);    
    job.setNumReduceTasks(5);   
    job.waitForCompletion(true);  

}

}

假設輸入文件是A.TXT當運行使用Hadoop-0.20這個程序，其具有以下內容

ABCDABCD

。 2（爲了清楚起見未示出命令），來的輸出是 A 1 A 1 B 1 B！ C 1 C 1 D！ d 1

這是wrong.The實際輸出應該是： A 2 B 2 的C 2 d 2

這「字計數」的程序是非常標準的program.I真的不知道是什麼這段代碼錯了。我已經正確編寫了像mapred-site.xml，core-site.xml等所有配置文件的內容。

如果有人能幫助我，我將不勝感激。

謝謝。

來源

2011-03-28 Anand Pandey

該代碼實際上運行本地mapreduce作業。如果您想將其提交給真實羣集，則必須提供fs.default.name和mapred.job.tracker配置參數。這些密鑰通過主機：端口對映射到您的機器。就像在你的mapred/core-site.xml中一樣。
確保您的數據在HDFS中可用，而不是在本地磁盤上，以及您的減少數量應該減少。大約每個減速器2個記錄。您應將此設置爲1.

來源

2011-03-28 13:28:58

減少簽名不正確。第二個參數是可迭代式，而不是迭代器

http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Reducer.html

參見Using Hadoop for the First Time, MapReduce Job does not run Reduce Phase

來源

2012-02-22 22:50:11

運行使用Hadoop0.20.2的標準「WordCount」程序時沒有得到正確的輸出

回答

相關問題