我在輸出中得到了很多重複值,所以我實現了一個reduce函數,如下所示,但這個reduce函數仍然可以作爲一個身份函數,即使我有一個reduce也沒有差別。我的縮小功能有什麼問題?重複mapreduce程序輸出?
public class search
{
public static String str="And";
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>
{
String mname="";
public void configure(JobConf job)
{
mname=job.get(str);
job.set(mname,str);
}
private Text word = new Text();
public Text Uinput =new Text("");
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
{
String mapstr=mname;
Uinput.set(mapstr);
String line = value.toString();
Text fdata = new Text();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
fdata.set(line);
if(word.equals(Uinput))
output.collect(fdata,new Text(""));
}
}
}
public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text>
{
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
{
boolean start = true;
//System.out.println("inside reduce :"+input);
StringBuilder sb = new StringBuilder();
while (values.hasNext())
{
if(!start)
start=false;
sb.append(values.next().toString());
}
//output.collect(key, new IntWritable(sum));
output.collect(key, new Text(sb.toString()));
}
}
公共靜態無效的主要(字串[] args)拋出異常 {
JobConf conf = new JobConf(search.class);
conf.setJobName("QueryIndex");
//JobConf conf = new JobConf(getConf(), WordCount.class);
conf.set(str,args[0]);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(Map.class);
//conf.setCombinerClass(SReducer.class);
conf.setReducerClass(SReducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path("IIndexOut"));
FileOutputFormat.setOutputPath(conf, new Path("searchOut"));
JobClient.runJob(conf);
}
}
可能的重複:http://stackoverflow.com/questions/10305435/hadoop-inverted-index-without-recurrence-of-file-names – 2012-04-26 20:15:11
嗨馬特,我已經通過該帖子,但它沒有解決我的問題。這就是我發佈自己的原因。 – 2012-04-26 20:18:07