1
我試圖存儲來自Map函數獲取的鍵值對的值並進一步使用它們。鑑於以下輸入:在Hadoop中,如果要將每個鍵值對的值保存到Array中,爲什麼添加的所有元素都是相同的?
Hello hadoop goodbye hadoop
Hello world goodbye world
Hello thinker goodbye thinker
的下面的代碼:
注意 - 地圖是一個簡單的字計數例如
public class Inception extends Configured implements Tool{
public Path workingPath;
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
// initialising the arrays that contain the values and the keys
public ArrayList<LongWritable> keyBuff = new ArrayList<LongWritable>();
public ArrayList<Text> valueBuff = new ArrayList<Text>();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
System.out.println(word + "/" + one);
}
}
public void innerMap(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// adding the value to the bufferr
valueBuff.add(value);
System.out.println("ArrayList addValue -> " + value);
for (Text v : valueBuff){
System.out.println("ArrayList containedValue -> " + value);
}
keyBuff.add(key);
}
public void run(Context context) throws IOException, InterruptedException {
setup(context);
// going over the key-value pairs and storing them into the arrays
while(context.nextKeyValue()){
innerMap(context.getCurrentKey(), context.getCurrentValue(), context);
}
Iterator itrv = valueBuff.iterator();
Iterator itrk = keyBuff.iterator();
while(itrv.hasNext()){
LongWritable nextk = (LongWritable) itrk.next();
Text nextv = (Text) itrv.next();
System.out.println("Value iterator -> " + nextv);
System.out.println("Key iterator -> " + nextk);
// iterating over the values and running the map on them.
map(nextk, nextv, context);
}
cleanup(context);
}
}
public int run(String[] args) throws Exception { ... }
public static void main (..) { ... }
好了,現在日誌輸出:
stdout日誌
ArrayList addValue -> Hello hadoop goodbye hadoop
ArrayList containedValue -> Hello hadoop goodbye hadoop
ArrayList addValue -> Hello world goodbye world
ArrayList containedValue -> Hello world goodbye world
ArrayList containedValue -> Hello world goodbye world
ArrayList addValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello/1
thinker/1
goodbye/1
thinker/1
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello/1
thinker/1
goodbye/1
thinker/1
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello/1
thinker/1
goodbye/1
thinker/1
所以你可以注意到的是,每當我給ArrayList valueBuff添加一個新值時,列表中的所有值都被覆蓋。有沒有人知道爲什麼這會發生,爲什麼值不能在數組中正確添加?
代碼根本不可讀,至少你可以刪除死碼: – 2011-12-29 15:20:06
更新了代碼刪除了除Map之外的所有內容以及我想要做的事對不起,你說得對我應該沒有發佈全部。 – inquire 2011-12-29 15:51:24