Hadoop的 - 輸入的共線文件

我有一個輸入文件包含：Hadoop的 - 輸入的共線文件

id value 
1e 1 
2e 1 
... 
2e 1 
3e 1 
4e 1

而且我想找到我的輸入文件的總ID。因此，在我的主要中，我已經聲明瞭一個列表，以便當我讀取輸入文件時，我會將該行插入列表中

MainDriver.java public static Set list = new HashSet（）;

和我在我的地圖

// Apply regex to find the id 
... 

// Insert id to the list 
MainDriver.list.add(regex.group(1)); // add 1e, 2e, 3e ...

，並在我的減少，我嘗試使用列表作爲

public void reduce(WritableComparable key, Iterator values, 
      OutputCollector output, Reporter reporter) throws IOException 
    { 
     ... 
     output.collect(key, new IntWritable(MainDriver.list.size())); 
    }

因此，我希望值打印出來的文件，在這種情況下將4但它實際上打印出0.

我已驗證regex.group（1）會提取有效的ID。所以我不知道爲什麼在reduce過程中我的列表大小爲0。

來源

2015-02-24 Tom

映射器和reducer運行在不同的JVM上（通常是獨立的機器），它們都來自驅動程序，所以沒有一個list Set變量的常見實例，所有這些方法都可以同時讀寫至。在MapReduce的

一種方法來計算密鑰的數量是：

的Emit (id, 1)從映射器
（可選地）使用組合器，以儘量減少網絡和減速器總和1 S代表每個映射器I/ø
在減速機：
- 在setup()初始化類範圍的數字變量（int或長presumbly）至0
- 在reduce()增量計數器，而忽視了價值
- 在cleanup()現在發出的所有按鍵都被處理
與單個減速運行作業的計數值，所以所有的鑰匙去同一個JVM在那裏可以進行單一計數

來源

2015-02-24 03:59:37

這基本上忽略了首先使用MapReduce的優勢。

如果我錯了，糾正我，但它似乎可以映射您的Mapper輸出由「ID」，然後在您的Reducer中收到類似Text key, Iterator values作爲參數。

然後可以只是總結values和輸出output.collect(key, <total value>);

實施例（道歉使用背景信息，而不是OutputCollector，但邏輯是相同的）：

public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> { 

    private final Text key = new Text("id"); 
    private final Text id = new Text(); 

    public void map(LongWritable key, Text value, 
        Context context) throws IOException, InterruptedException { 
     id.set(regex.group(1)); // do whatever you do 
     context.write(id, countOne); 
    } 

} 

public static class MyReducer extends Reducer<Text, Text, Text, IntWritable> { 

    private final IntWritable totalCount = new IntWritable(); 

    public void reduce(Text key, Iterable<Text> values, 
         Context context) throws IOException, InterruptedException { 

     int cnt = 0; 
     for (Text value : values) { 
      cnt ++; 
     } 

     totalCount.set(cnt); 
     context.write(key, totalCount); 
    } 

}

來源

2015-02-24 04:03:04 whitfin

Hadoop的 - 輸入的共線文件

回答

相關問題