編輯:問題已解決 - 我有一個非常愚蠢的錯誤。Hadoop ArrayWritable給我一個ClassCastException
我有一個MapReduce管道,由一個map,reduce,map和reduce組成。我爲第一個reduce使用SequenceFileOutputFormat,爲第二個地圖使用SequenceFileInputFormat。我已經看過它的使用情況,並且好像我正在使用它。我將它放入的類型是IntWritable和IntPairArrayWritable(使用Mahout中的IntPairWritable的自定義ArrayWritable子類)。問題是,當我讀取第二個映射中的IntPairArrayWritable時,當我嘗試獲取單個IntPairWritable時,出現ClassCastException。我不確定這是由於我如何使用ArrayWritable類的錯誤,或者如果使用SequenceFile {Input,Output}格式時出現錯誤。我在這裏和其他地方看過一堆示例,它看起來像我在做他們兩個都是對的,但我仍然遇到錯誤。任何幫助?
的細節:
這是我的第一個減速類:
public static class WalkIdReducer extends MapReduceBase implements
Reducer<IntWritable, IntPairWritable, IntWritable, IntPairArrayWritable> {
@Override
public void reduce(IntWritable walk_id, Iterator<IntPairWritable> values,
OutputCollector<IntWritable, IntPairArrayWritable> output,
Reporter reporter) throws IOException {
ArrayList<IntPairWritable> value_array = new ArrayList<IntPairWritable>();
while (values.hasNext()) {
value_array.add(values.next());
}
output.collect(walk_id, IntPairArrayWritable.fromArrayList(value_array));
}
}
而第二映射器類:
public static class NodePairMapper extends MapReduceBase implements
Mapper<IntWritable, IntPairArrayWritable, IntPairWritable, Text> {
@Override
public void map(IntWritable key, IntPairArrayWritable value,
OutputCollector<IntPairWritable, Text> output,
Reporter reporter) throws IOException {
// The following line gives a ClassCastException;
// See IntPairArrayWritable.toArrayList(), below
ArrayList<IntPairWritable> values = value.toArrayList();
// other unimportant stuff
}
}
爲先的MapReduce作業配置的相關部分:
conf.setReducerClass(WalkIdReducer.class);
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(IntPairArrayWritable.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);
而對於第二的MapReduce:
conf.setInputFormat(SequenceFileInputFormat.class);
conf.setMapperClass(NodePairMapper.class);
最後,我ArrayWritable子類:
public static class IntPairArrayWritable extends ArrayWritable
{
// These two methods are what people say is all you need for
// creating an ArrayWritable subclass
public IntPairArrayWritable() {
super(IntPairArrayWritable.class);
}
public IntPairArrayWritable(IntPairWritable[] values) {
super(IntPairArrayWritable.class, values);
}
// Some convenience methods, so I can use ArrayLists in
// other parts of the code
public static IntPairArrayWritable fromArrayList(
ArrayList<IntPairWritable> array) {
IntPairArrayWritable writable = new IntPairArrayWritable();
IntPairWritable[] values = new IntPairWritable[array.size()];
for (int i=0; i<array.size(); i++) {
values[i] = array.get(i);
}
writable.set(values);
return writable;
}
public ArrayList<IntPairWritable> toArrayList() {
ArrayList<IntPairWritable> array = new ArrayList<IntPairWritable>();
for (Writable pair : this.get()) {
// This line is what kills it. I get a ClassCastException here.
IntPairWritable int_pair = (IntPairWritable) pair;
array.add(int_pair);
}
return array;
}
}
特定的錯誤我得到的是以下幾點:
java.lang.ClassCastException: WalkAnalyzer$IntPairArrayWritable cannot be cast to org.apache.mahout.common.IntPairWritable
at WalkAnalyzer$IntPairArrayWritable.toArrayList(WalkAnalyzer.java:231)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:84)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:77)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
我很困惑至於爲什麼從ArrayWritable的get()方法中得到的是一個WalkAnalyzer$IntPairArrayWritable
的實例 - 我期待get()返回中包含的元素的數組,如API中所述。
編輯
我發現這個問題。這是我如何編寫IntPairArrayWritable的構造函數。當我應該撥打super(IntPairWritable.class);
時,我打電話給super(IntPairArrayWritable.class);
。該代碼實際上應該是這樣的:
public static class IntPairArrayWritable extends ArrayWritable
{
// These two methods are what people say is all you need for
// creating an ArrayWritable subclass
public IntPairArrayWritable() {
super(IntPairWritable.class);
}
public IntPairArrayWritable(IntPairWritable[] values) {
super(IntPairWritable.class, values);
}
}
,我想它會一直使用較少的明顯困惑名稱爲ArrayWritable子是個好主意,這樣的錯誤本來就容易被發現。
嗯......以上所有代碼都在一個文件WalkAnalyzer.java中。 IntPairWritable只有一個import語句,那就是'import org.apache.mahout.common.IntPairWritable;'。我很難看出這可能是怎樣的問題... – mattg
啊哈!但我想,試圖理解你的答案讓我發現了這個問題。我的構造函數在IntPairArrayWritable中是錯誤的。雙重檢查,現在的問題... – mattg
所以現在的工作? –