2011-12-29 25 views
1

我試圖存儲來自Map函數獲取的鍵值對的值並進一步使用它們。鑑於以下輸入:在Hadoop中,如果要將每個鍵值對的值保存到Array中,爲什麼添加的所有元素都是相同的?

Hello hadoop goodbye hadoop 
Hello world goodbye world 
Hello thinker goodbye thinker 

的下面的代碼:

注意 - 地圖是一個簡單的字計數例如

public class Inception extends Configured implements Tool{ 

public Path workingPath; 

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 
    private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 

    // initialising the arrays that contain the values and the keys 
    public ArrayList<LongWritable> keyBuff = new ArrayList<LongWritable>(); 
    public ArrayList<Text> valueBuff = new ArrayList<Text>(); 


    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     String line = value.toString(); 
     StringTokenizer tokenizer = new StringTokenizer(line); 

     while (tokenizer.hasMoreTokens()) { 
      word.set(tokenizer.nextToken()); 
      context.write(word, one); 
      System.out.println(word + "/" + one); 
     } 
    } 

    public void innerMap(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 

      // adding the value to the bufferr 
     valueBuff.add(value); 
     System.out.println("ArrayList addValue -> " + value); 
     for (Text v : valueBuff){ 
      System.out.println("ArrayList containedValue -> " + value); 
     } 

     keyBuff.add(key); 

     } 

    public void run(Context context) throws IOException, InterruptedException { 
     setup(context); 

     // going over the key-value pairs and storing them into the arrays 
     while(context.nextKeyValue()){ 
      innerMap(context.getCurrentKey(), context.getCurrentValue(), context); 
     } 


     Iterator itrv = valueBuff.iterator(); 
     Iterator itrk = keyBuff.iterator(); 
     while(itrv.hasNext()){ 
      LongWritable nextk = (LongWritable) itrk.next(); 
      Text nextv = (Text) itrv.next(); 
      System.out.println("Value iterator -> " + nextv); 
      System.out.println("Key iterator -> " + nextk); 

      // iterating over the values and running the map on them. 

      map(nextk, nextv, context); 
     } 

     cleanup(context); 
    } 
} 

public int run(String[] args) throws Exception { ... } 

public static void main (..) { ... } 

好了,現在日誌輸出:

stdout日誌

ArrayList addValue -> Hello hadoop goodbye hadoop 
ArrayList containedValue -> Hello hadoop goodbye hadoop 
ArrayList addValue -> Hello world goodbye world 
ArrayList containedValue -> Hello world goodbye world 
ArrayList containedValue -> Hello world goodbye world 
ArrayList addValue -> Hello thinker goodbye thinker 
ArrayList containedValue -> Hello thinker goodbye thinker 
ArrayList containedValue -> Hello thinker goodbye thinker 
ArrayList containedValue -> Hello thinker goodbye thinker 
Value iterator -> Hello thinker goodbye thinker 
Key iterator -> 84 
Hello/1 
thinker/1 
goodbye/1 
thinker/1 
Value iterator -> Hello thinker goodbye thinker 
Key iterator -> 84 
Hello/1 
thinker/1 
goodbye/1 
thinker/1 
Value iterator -> Hello thinker goodbye thinker 
Key iterator -> 84 
Hello/1 
thinker/1 
goodbye/1 
thinker/1 

所以你可以注意到的是,每當我給ArrayList valueBuff添加一個新值時,列表中的所有值都被覆蓋。有沒有人知道爲什麼這會發生,爲什麼值不能在數組中正確添加?

+0

代碼根本不可讀,至少你可以刪除死碼: – 2011-12-29 15:20:06

+0

更新了代碼刪除了除Map之外的所有內容以及我想要做的事對不起,你說得對我應該沒有發佈全部。 – inquire 2011-12-29 15:51:24

回答

2

TextInputFormat使用LineRecordReader。當調用Context#nextKeyValue時,LineRecordReader#nextKeyValue被調用。

在LineRecordReader中,每次調用nextKeyValue方法時都使用相同的鍵和值對象,只更改其內容。如果密鑰和數值數據應該保留,則必須在用戶代碼中創建對象的副本。

這對優化是有意義的,如果爲每個記錄創建一個新的鍵和值對象,那麼系統將很容易地進入OOM。

相關問題