2014-05-01 28 views
1

我有兩個文件作爲輸入:Hadoop的反向索引計數

fileA.txt:

learn hadoop 
learn java 

fileB.txt:

hadoop java 
eclipse eclipse 

所需的輸出:

learn fileA.txt:2 

hadoop fileA.txt:1 , fileB.txt:1 

java fileA.txt:1 , fileB.txt:1 

eclipse fileB.txt:2 

我減少方法:

public void reduce(Text key, Iterator<Text> values, 
       OutputCollector<Text, Text> output, Reporter reporter) 
       throws IOException { 

      Set<Text> outputValues = new HashSet<Text>(); 
      while (values.hasNext()) { 
       Text value = new Text(values.next()); 
       // delete duplicates 
       outputValues.add(value); 
      } 
      boolean isfirst = true; 
      StringBuilder toReturn = new StringBuilder(); 
      Iterator<Text> outputIter = outputValues.iterator(); 
      while (outputIter.hasNext()) { 
       if (!isfirst) { 
        toReturn.append("/"); 
       } 
       isfirst = false; 
       toReturn.append(outputIter.next().toString()); 
      } 
      output.collect(key, new Text(toReturn.toString())); 
     } 

我需要與計數器幫助(計數按文件的話)

我設法打印:

learn fileA.txt 

hadoop fileA.txt/fileB.txt 

java fileA.txt/fileB.txt 

eclipse fileB.txt 

,但不能打印每個文件計數

任何幫助將不勝感激

+0

也許試圖列出你特別與它有什麼問題,這將有助於鼓勵人們發表簡潔的解決方案。歡迎來到堆棧溢出! – WillBD

+0

你有沒有考慮過讓你的密鑰成爲單詞和文件名,而只是使用標準的IntSumReducer類? – aplassard

回答

1

據我瞭解,這應該打印你想要的:

@Override 
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { 
    Map<String, Integer> fileToCnt = new HashMap<String, Integer>(); 
    while(values.hasNext()) { 
     String file = values.next().toString(); 
     Integer current = fileToCnt.get(file); 
     if (current == null) { 
      current = 0; 
     } 
     fileToCnt.put(file, current + 1); 
    } 
    boolean isfirst = true; 
    StringBuilder toReturn = new StringBuilder(); 
    for (Map.Entry<String, Integer> entry : fileToCnt.entrySet()) { 
     if (!isfirst) { 
      toReturn.append(", "); 
     } 
     isfirst = false; 
     toReturn.append(entry.getKey()).append(":").append(entry.getValue()); 
    } 
    output.collect(key, new Text(toReturn.toString())); 
} 
+0

謝謝,這幫助我解決了我的問題!你可以推薦任何資源來了解更多有關mapReduce和Hadoop的新Api.cheers – user3591709

+0

我只能推薦http://hadoopbook.com/。 –