Hadoop的反向索引計數

fileA.txt：

learn hadoop 
learn java

fileB.txt：

hadoop java 
eclipse eclipse

所需的輸出：

learn fileA.txt:2 

hadoop fileA.txt:1 , fileB.txt:1 

java fileA.txt:1 , fileB.txt:1 

eclipse fileB.txt:2

我減少方法：

public void reduce(Text key, Iterator<Text> values, 
       OutputCollector<Text, Text> output, Reporter reporter) 
       throws IOException { 

      Set<Text> outputValues = new HashSet<Text>(); 
      while (values.hasNext()) { 
       Text value = new Text(values.next()); 
       // delete duplicates 
       outputValues.add(value); 
      } 
      boolean isfirst = true; 
      StringBuilder toReturn = new StringBuilder(); 
      Iterator<Text> outputIter = outputValues.iterator(); 
      while (outputIter.hasNext()) { 
       if (!isfirst) { 
        toReturn.append("/"); 
       } 
       isfirst = false; 
       toReturn.append(outputIter.next().toString()); 
      } 
      output.collect(key, new Text(toReturn.toString())); 
     }

我需要與計數器幫助（計數按文件的話）

我設法打印：

learn fileA.txt 

hadoop fileA.txt/fileB.txt 

java fileA.txt/fileB.txt 

eclipse fileB.txt

，但不能打印每個文件計數

任何幫助將不勝感激

來源

2014-05-01 user3591709

也許試圖列出你特別與它有什麼問題，這將有助於鼓勵人們發表簡潔的解決方案。歡迎來到堆棧溢出！ – WillBD

你有沒有考慮過讓你的密鑰成爲單詞和文件名，而只是使用標準的IntSumReducer類？ – aplassard

據我瞭解，這應該打印你想要的：

@Override 
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { 
    Map<String, Integer> fileToCnt = new HashMap<String, Integer>(); 
    while(values.hasNext()) { 
     String file = values.next().toString(); 
     Integer current = fileToCnt.get(file); 
     if (current == null) { 
      current = 0; 
     } 
     fileToCnt.put(file, current + 1); 
    } 
    boolean isfirst = true; 
    StringBuilder toReturn = new StringBuilder(); 
    for (Map.Entry<String, Integer> entry : fileToCnt.entrySet()) { 
     if (!isfirst) { 
      toReturn.append(", "); 
     } 
     isfirst = false; 
     toReturn.append(entry.getKey()).append(":").append(entry.getValue()); 
    } 
    output.collect(key, new Text(toReturn.toString())); 
}

來源

2014-05-01 17:40:17

謝謝，這幫助我解決了我的問題！你可以推薦任何資源來了解更多有關mapReduce和Hadoop的新Api.cheers – user3591709

我只能推薦http://hadoopbook.com/。 –

Hadoop的反向索引計數

回答

相關問題