2016-06-08 46 views
2

我想在Hadoop中創建SetWritable。這是我的實現。我剛開始使用MapReduce,我無法弄清楚我應該如何做到這一點。我寫了下面的代碼,但它不起作用。Hadoop中的SetWritable?

定製可寫(這需要是一組):

public class TextPair implements Writable { 

    private Text first; 
    public HashSet<String> valueSet = new HashSet<String>(); 
    public TextPair() { 

    } 

    @Override 
    public void write(DataOutput out) throws IOException { 
     out.writeInt(valueSet.size()); 
     Iterator<String> it = valueSet.iterator(); 
     while (it.hasNext()) { 
      this.first = new Text(it.next()); 
      first.write(out); 
     } 
    } 

    @Override 
    public void readFields(DataInput in) throws IOException { 
     Iterator<String> it = valueSet.iterator(); 
     while (it.hasNext()) { 
      this.first = new Text(it.next()); 
      first.readFields(in); 
     } 
    } 

} 

映射器代碼:

public class TokenizerMapper extends Mapper<Object, Text, Text, TextPair> { 

    ArrayList<String> al = new ArrayList<String>(); 
    TextPair tp = new TextPair(); 

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException { 

     String [] val = value.toString().substring(2,value.toString().length()).split(" "); 

     for(String v: val) { 
      tp.valueSet.add(v); 
     } 
     String [] vals = value.toString().split(" "); 

     for(int i=0; i<vals.length-1; i++) { 
      setKey(vals[0],vals[i+1]); 
      System.out.println(getKey()); 
      context.write(new Text(getKey()), tp); 
     } 
    } 

    public void setKey(String first,String second) { 

     al.clear(); 
     al.add(first); 
     al.add(second); 

     java.util.Collections.sort(al); 
    } 

    public String getKey() { 

     String tp = al.get(0)+al.get(1); 
     return tp; 
    } 
} 

我基本上試圖發射SetWritable從映射器值。請提出我需要做出什麼改變。謝謝!

回答

3

我會說你在閱讀和寫作方面有問題。您需要知道Set的大小,並使用它來讀取正確數量的Text對象。

我將您的版本更改爲一組文本對象,因爲它們可以輕鬆讀取和寫入。

public class TextWritable implements Writable { 

    private Set<Text> values; 

    public TextPair() { 
     values = new HashSet<Text>(); 
    } 

    @Override 
    public void write(DataOutput out) throws IOException { 

     // Write out the size of the Set 
     out.writeInt(valueSet.size()); 

     // Write out each Text object 
     for(Text t : values) { 
      t.write(out); 
     } 
    } 

    @Override 
    public void readFields(DataInput in) throws IOException { 

     // Make sure we have a HashSet to fill up 
     values = new HashSet<Text>(); 

     // Get the number of elements in the set 
     int size = in.readInt(); 

     // Read the correct number of Text objects 
     for(int i=0; i<size; i++) { 
      Text t = new Text(); 
      t.readFields(in); 
      values.add(t); 
     } 
    } 
} 

您應該添加一些輔助類來添加元素到設置。

我也看不到你在哪裏clear設置在map方法。如果你沒有清除它,那麼每次調用map方法時它可能會越來越大。

查看Hadoop ArrayWritable僅供參考。