Hadoop MapReduce，如何減少自定義對象？

我是Hadoop的新手，我正在嘗試使用Reducer類。所以，基本上我發現了一個在線教程，他們的減少類看起來是這樣的，Hadoop MapReduce，如何減少自定義對象？

public class mapReducer extends Reducer<Text, IntWritable, Text, IntWritable>{ 
    IntWritable total = new IntWritable(); 
    @Override 
    protected void reduce(Text key, Iterable<IntWritable> values, 
      Reducer<Text, InWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException{ 
     for (IntWritable value: values){ 
      total += value.get(); 
     } 
     context.write(key, count); 
    } 
}

所以我想用myCustomObj改變總。參照上面的例子，像，

//.. 
myCustomObj total = new myCustomObj(); 
@Override 
protected void reduce(Text key, Iterable<myCustomObj> values, 
     Reducer<Text, InWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException{ 
    for (myCustomObj value: values){ 
     total.add(value); 
    } 
    context.write(key, total.getPrimaryAttribute()); 
}

目的：我要的是列表的hadoop後key -> total的對象已經完成減少。我認爲上面的代碼只會輸出key -> primaryAttribute。

建議：如果這太繁瑣，我有一個想法，我需要在磁盤上以XML格式存儲的細節。但是，我不確定映射還原器背後的理論，還原器是在服務器還是客戶端計算機（映射發生的地方）執行？如果它發生在客戶端計算機上，那麼我將在所有客戶端計算機上有一點點我的XML文件。我只想把所有的信息集中到一臺服務器上。

我希望我明確提出了我的問題。謝謝

編輯：我試圖尋找在線來源。但是有很多定製的hadoops。我不知道我應該看什麼。

來源

2017-04-01 user859385

目前尚不清楚你的問題是什麼。「myCustomObj」的實現是什麼樣的？ –

爲了能夠減少自定義對象，首先，映射器應該將此對象作爲值返回。假設你的對象的名稱是：CustomObject映射器的定義應該是這樣的：

public class MyMapper extends Mapper<LongWritable, Text, Text, CustomObject> { 
    @Override 
    protected void map(LongWritable key, Text value, 
      Mapper<LongWritable, Text, Text, CustomObject>.Context context) throws IOException, InterruptedException { 
     // do you stuff here 
    } 
}

現在CustomObject本身應該實現WritableComparable接口與所有的三個必需的方法（主要爲洗牌階段要求）：

write - 定義你的對象存儲到磁盤的方式
readFields - 如何從磁盤讀取存儲的對象
compareTo - 定義的方式，對象的排序方式（你可以離開這個空白，因爲只有密鑰被用於在洗牌階段排序）

減速簽名應該是這樣的：

public class MyReducer extends Reducer<Text, CustomObject, Text, IntWritable>{ 
    @Override 
    protected void reduce(Text key, Iterable<CustomObject> values, 
      Reducer<Text, CustomObject, Text, IntWritable>.Context context) throws IOException, InterruptedException{ 
     // reducer code 
    } 
}

最後，在配置作業時，應指定正確的輸入/輸出組合。

job.setMapOutputKeyClass(Text.class); 
job.setMapOutputValueClass(CustomObject.class); 
job.setOutputKeyClass(Text.class); 
job.setOutputValueClass(IntWritable.class); 
job.setMapperClass(MyMapper.class); 
job.setReducerClass(MyReducer.class);

這應該可以做到。

來源

2017-04-03 14:11:36 Serhiy

Hadoop MapReduce，如何減少自定義對象？

回答

相關問題