我正在致力於Hadoop。我的輸出比預期的要快。
我無法理解爲什麼會發生這種情況。
請幫我Hadoop中的Mapreduce程序中的意外輸出
下面是映射類:
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class StringMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
//hadoop supported data types
private static IntWritable send;
private Text word;
//map method that performs the tokenizer job and framing the initial key value pairs
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
String line = value.toString();
String out="";
int count=0;
out+=Integer.toString(count);
send = new IntWritable(1);
word = new Text(out);
output.collect(word, send);
}
}
下面是減速類
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class StringReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>
{
//reduce method accepts the Key Value pairs from mappers, do the aggregation based on keys and produce the final output
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
int sum=0;
while(values.hasNext()){
sum=sum+values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
樣品輸入:
dashjdasdhashjfsda
dashjdasdhashjfsda
dashjdasdhashjfsda
dashjdasdhashjfsda
dashjdasdhashjfsda
樣本輸出
這裏輸出應該是0,而不是5 0 10,因爲只有5在我的輸入線。
該代碼對我來說很合適。你打這個工作怎麼樣?您可以發佈您的YARN cmd行並報告HDFS上的文件結構。 –