在Hadoop中，如果要將每個鍵值對的值保存到Array中，爲什麼添加的所有元素都是相同的？

我試圖存儲來自Map函數獲取的鍵值對的值並進一步使用它們。鑑於以下輸入：在Hadoop中，如果要將每個鍵值對的值保存到Array中，爲什麼添加的所有元素都是相同的？

Hello hadoop goodbye hadoop 
Hello world goodbye world 
Hello thinker goodbye thinker

的下面的代碼：

注意 - 地圖是一個簡單的字計數例如

public class Inception extends Configured implements Tool{ 

public Path workingPath; 

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 
    private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 

    // initialising the arrays that contain the values and the keys 
    public ArrayList<LongWritable> keyBuff = new ArrayList<LongWritable>(); 
    public ArrayList<Text> valueBuff = new ArrayList<Text>(); 


    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     String line = value.toString(); 
     StringTokenizer tokenizer = new StringTokenizer(line); 

     while (tokenizer.hasMoreTokens()) { 
      word.set(tokenizer.nextToken()); 
      context.write(word, one); 
      System.out.println(word + "/" + one); 
     } 
    } 

    public void innerMap(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 

      // adding the value to the bufferr 
     valueBuff.add(value); 
     System.out.println("ArrayList addValue -> " + value); 
     for (Text v : valueBuff){ 
      System.out.println("ArrayList containedValue -> " + value); 
     } 

     keyBuff.add(key); 

     } 

    public void run(Context context) throws IOException, InterruptedException { 
     setup(context); 

     // going over the key-value pairs and storing them into the arrays 
     while(context.nextKeyValue()){ 
      innerMap(context.getCurrentKey(), context.getCurrentValue(), context); 
     } 


     Iterator itrv = valueBuff.iterator(); 
     Iterator itrk = keyBuff.iterator(); 
     while(itrv.hasNext()){ 
      LongWritable nextk = (LongWritable) itrk.next(); 
      Text nextv = (Text) itrv.next(); 
      System.out.println("Value iterator -> " + nextv); 
      System.out.println("Key iterator -> " + nextk); 

      // iterating over the values and running the map on them. 

      map(nextk, nextv, context); 
     } 

     cleanup(context); 
    } 
} 

public int run(String[] args) throws Exception { ... } 

public static void main (..) { ... }

好了，現在日誌輸出：

stdout日誌

ArrayList addValue -> Hello hadoop goodbye hadoop 
ArrayList containedValue -> Hello hadoop goodbye hadoop 
ArrayList addValue -> Hello world goodbye world 
ArrayList containedValue -> Hello world goodbye world 
ArrayList containedValue -> Hello world goodbye world 
ArrayList addValue -> Hello thinker goodbye thinker 
ArrayList containedValue -> Hello thinker goodbye thinker 
ArrayList containedValue -> Hello thinker goodbye thinker 
ArrayList containedValue -> Hello thinker goodbye thinker 
Value iterator -> Hello thinker goodbye thinker 
Key iterator -> 84 
Hello/1 
thinker/1 
goodbye/1 
thinker/1 
Value iterator -> Hello thinker goodbye thinker 
Key iterator -> 84 
Hello/1 
thinker/1 
goodbye/1 
thinker/1 
Value iterator -> Hello thinker goodbye thinker 
Key iterator -> 84 
Hello/1 
thinker/1 
goodbye/1 
thinker/1

所以你可以注意到的是，每當我給ArrayList valueBuff添加一個新值時，列表中的所有值都被覆蓋。有沒有人知道爲什麼這會發生，爲什麼值不能在數組中正確添加？

來源

2011-12-29 inquire

代碼根本不可讀，至少你可以刪除死碼： – 2011-12-29 15:20:06

更新了代碼刪除了除Map之外的所有內容以及我想要做的事對不起，你說得對我應該沒有發佈全部。 – inquire 2011-12-29 15:51:24

TextInputFormat使用LineRecordReader。當調用Context＃nextKeyValue時，LineRecordReader＃nextKeyValue被調用。

在LineRecordReader中，每次調用nextKeyValue方法時都使用相同的鍵和值對象，只更改其內容。如果密鑰和數值數據應該保留，則必須在用戶代碼中創建對象的副本。

這對優化是有意義的，如果爲每個記錄創建一個新的鍵和值對象，那麼系統將很容易地進入OOM。

來源

2011-12-30 01:53:54

在Hadoop中，如果要將每個鍵值對的值保存到Array中，爲什麼添加的所有元素都是相同的？

回答

相關問題