2016-12-07 100 views
1

我想使用MapReduce來查找由標籤分隔的標籤分隔輸入的總和。數據看起來像這樣MapReduce來計算標籤分隔輸入值的總和

1  5.0 4.0 6.0 
2  2.0 1.0 3.0 
1  3.0 4.0 8.0 

第一列是類標籤,所以我期待按類標籤分類的輸出。對於這種情況下的輸出將

label 1: 30.0 
label 2: 6.0 

這裏是我試過的代碼,但我得到錯誤的輸出和

顯示意外的類標籤。

public class Total { 

public static class Map extends Mapper<LongWritable, Text, Text, DoubleWritable> { 
    private final static DoubleWritable one = new DoubleWritable(); 
    private Text word = new Text(); 

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     String line = value.toString(); 
     StringTokenizer tokenizer = new StringTokenizer(line); 
     word.set(tokenizer.nextToken()); 
     while (tokenizer.hasMoreTokens()) { 
      one.set(Double.valueOf(tokenizer.nextToken())); 
      context.write(word, one);           
     } 
    } 
} 

public static class Reduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> { 
    private Text Msg = new Text(); 


    public void reduce(Text key, Iterable<DoubleWritable> values, Context context) 
     throws IOException, InterruptedException { 
     firstMsg.set("label " + key+": Total"); 

     Double sum = 0.0; 

     for (DoubleWritable val : values) { 

      sum += val.get(); 


     } 

     context.write(Msg, new DoubleWritable(sum)); 

    } 
} 
//void method implementation also exists 
} 

回答

1

你的目標是讓所有相同的密鑰爲自己減速,這樣就可以概括的數字。

所以,藉此

1  5.0 4.0 6.0 
2  2.0 1.0 3.0 
1  3.0 4.0 8.0 

,基本上建立這個

1  [(5 .0 4.0 6.0), (3.0 4.0 8.0)] 
2  [(2.0 1.0 3.0)] 

所以,你的地圖應該只輸出鍵1和2,每個在他們之後的剩餘值,每個鍵不一定有很多值。

爲此,您可以使用Mapper<LongWritable, Text, Text, Text>。 (更改輸出數據類型爲Text

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
    String line = value.toString(); 

    StringTokenizer tokenizer = new StringTokenizer(line); 
    word.set("label " + tokenizer.nextToken()); 

    StringBuilder remainder = new StringBuilder(); 
    while (tokenizer.hasMoreTokens()) { 
     remainder.append(tokenizer.nextToken()).append(",");           
    } 
    String output = remainder.setLength(remainder.getLength() - 1).toString() 
    context.write(word, new Text(output)); 
} 

然後,在減速,使其Reducer<Text, Text, Text, DoubleWritable>(在(Text,Text)對讀),你現在有一個Iterable<Text> values這是逗號分隔的字符串的迭代,您可以將其解析爲雙打,並累計總和。

您並不需要reducer中的firstMsg.set件 - 這可以在映射器中完成。