2012-10-19 26 views
1

編輯:問題已解決 - 我有一個非常愚蠢的錯誤。Hadoop ArrayWritable給我一個ClassCastException

我有一個MapReduce管道,由一個map,reduce,map和reduce組成。我爲第一個reduce使用SequenceFileOutputFormat,爲第二個地圖使用SequenceFileInputFormat。我已經看過它的使用情況,並且好像我正在使用它。我將它放入的類型是IntWritable和IntPairArrayWritable(使用Mahout中的IntPairWritable的自定義ArrayWritable子類)。問題是,當我讀取第二個映射中的IntPairArrayWritable時,當我嘗試獲取單個IntPairWritable時,出現ClassCastException。我不確定這是由於我如何使用ArrayWritable類的錯誤,或者如果使用SequenceFile {Input,Output}格式時出現錯誤。我在這裏和其他地方看過一堆示例,它看起來像我在做他們兩個都是對的,但我仍然遇到錯誤。任何幫助?

的細節:

這是我的第一個減速類:

public static class WalkIdReducer extends MapReduceBase implements 
     Reducer<IntWritable, IntPairWritable, IntWritable, IntPairArrayWritable> { 

    @Override 
    public void reduce(IntWritable walk_id, Iterator<IntPairWritable> values, 
      OutputCollector<IntWritable, IntPairArrayWritable> output, 
      Reporter reporter) throws IOException { 
     ArrayList<IntPairWritable> value_array = new ArrayList<IntPairWritable>(); 
     while (values.hasNext()) { 
      value_array.add(values.next()); 
     } 
     output.collect(walk_id, IntPairArrayWritable.fromArrayList(value_array)); 
    } 
} 

而第二映射器類:

public static class NodePairMapper extends MapReduceBase implements 
     Mapper<IntWritable, IntPairArrayWritable, IntPairWritable, Text> { 

    @Override 
    public void map(IntWritable key, IntPairArrayWritable value, 
      OutputCollector<IntPairWritable, Text> output, 
      Reporter reporter) throws IOException { 
     // The following line gives a ClassCastException; 
     // See IntPairArrayWritable.toArrayList(), below 
     ArrayList<IntPairWritable> values = value.toArrayList(); 
     // other unimportant stuff 
    } 
} 

爲先的MapReduce作業配置的相關部分:

conf.setReducerClass(WalkIdReducer.class); 
    conf.setOutputKeyClass(IntWritable.class); 
    conf.setOutputValueClass(IntPairArrayWritable.class); 
    conf.setOutputFormat(SequenceFileOutputFormat.class); 

而對於第二的MapReduce:

conf.setInputFormat(SequenceFileInputFormat.class); 
    conf.setMapperClass(NodePairMapper.class); 

最後,我ArrayWritable子類:

public static class IntPairArrayWritable extends ArrayWritable 
{ 
    // These two methods are what people say is all you need for 
    // creating an ArrayWritable subclass 
    public IntPairArrayWritable() { 
     super(IntPairArrayWritable.class); 
    } 

    public IntPairArrayWritable(IntPairWritable[] values) { 
     super(IntPairArrayWritable.class, values); 
    } 

    // Some convenience methods, so I can use ArrayLists in 
    // other parts of the code 
    public static IntPairArrayWritable fromArrayList(
      ArrayList<IntPairWritable> array) { 
     IntPairArrayWritable writable = new IntPairArrayWritable(); 
     IntPairWritable[] values = new IntPairWritable[array.size()]; 
     for (int i=0; i<array.size(); i++) { 
      values[i] = array.get(i); 
     } 
     writable.set(values); 
     return writable; 
    } 

    public ArrayList<IntPairWritable> toArrayList() { 
     ArrayList<IntPairWritable> array = new ArrayList<IntPairWritable>(); 
     for (Writable pair : this.get()) { 
      // This line is what kills it. I get a ClassCastException here. 
      IntPairWritable int_pair = (IntPairWritable) pair; 
      array.add(int_pair); 
     } 
     return array; 
    } 
} 

特定的錯誤我得到的是以下幾點:

java.lang.ClassCastException: WalkAnalyzer$IntPairArrayWritable cannot be cast to org.apache.mahout.common.IntPairWritable 
at WalkAnalyzer$IntPairArrayWritable.toArrayList(WalkAnalyzer.java:231) 
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:84) 
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:77) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) 
at org.apache.hadoop.mapred.Child.main(Child.java:170) 

我很困惑至於爲什麼從ArrayWritable的get()方法中得到的是一個WalkAnalyzer$IntPairArrayWritable的實例 - 我期待get()返回中包含的元素的數組,如API中所述。

編輯

我發現這個問題。這是我如何編寫IntPairArrayWritable的構造函數。當我應該撥打super(IntPairWritable.class);時,我打電話給super(IntPairArrayWritable.class);。該代碼實際上應該是這樣的:

public static class IntPairArrayWritable extends ArrayWritable 
{ 
    // These two methods are what people say is all you need for 
    // creating an ArrayWritable subclass 
    public IntPairArrayWritable() { 
     super(IntPairWritable.class); 
    } 

    public IntPairArrayWritable(IntPairWritable[] values) { 
     super(IntPairWritable.class, values); 
    } 
} 

,我想它會一直使用較少的明顯困惑名稱爲ArrayWritable子是個好主意,這樣的錯誤本來就容易被發現。

回答

0

檢查IntPairWritable的導入語句。看起來你在Mapper中拾取了錯誤的包名稱,因此即使其名稱爲IntPairWritable,也會轉換爲差異類。

+0

嗯......以上所有代碼都在一個文件WalkAnalyzer.java中。 IntPairWritable只有一個import語句,那就是'import org.apache.mahout.common.IntPairWritable;'。我很難看出這可能是怎樣的問題... – mattg

+0

啊哈!但我想,試圖理解你的答案讓我發現了這個問題。我的構造函數在IntPairArrayWritable中是錯誤的。雙重檢查,現在的問題... – mattg

+0

所以現在的工作? –

相關問題