2010-12-08 77 views
3

如何爲定製Hadoop類型定義ArrayWritable?我想實現在Hadoop的倒排索引,使用自定義Hadoop的類型來存儲數據爲定製Hadoop類型實現ArrayWritable

我有一個個人發佈類存儲詞頻,文檔ID和字節偏移的名單,任期文檔中。

我有一個發佈類具有文檔頻率(的術語出現在文件數)和列表個別記帳的

我已經定義擴展ArrayWritable類的字節偏移的列表中的一個LongArrayWritable IndividualPostings

當我定義的自定義ArrayWritable爲IndividualPosting我遇到後本地部署(使用Karmasphere,Eclipse的)一些問題。

所有IndividualPosting在發帖類列表實例是相同的,即使我得到不同的值在減少方法

+0

你能解釋一下究竟是什麼問題嗎?也許發佈你的自定義ArrayWritable的一些代碼? – bajafresh4life 2010-12-08 15:04:19

回答

8

ArrayWritable文檔:

可寫的包含一個類的實例的數組。這個可寫的元素必須都是同一類的實例。如果這個可寫入將是Reducer的輸入,則需要創建一個將該值設置爲正確類型的子類。例如:public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }

你已經引與Hadoop的定義的WritableComparable類型這樣做。這是我認爲您的實現看起來像LongWritable

public static class LongArrayWritable extends ArrayWritable 
{ 
    public LongArrayWritable() { 
     super(LongWritable.class); 
    } 
    public LongArrayWritable(LongWritable[] values) { 
     super(LongWritable.class, values); 
    } 
} 

你應該能夠,實現WritableComparable通過the documentation給出任何類型的做到這一點。使用他們的例子:

public class MyWritableComparable implements 
     WritableComparable<MyWritableComparable> { 

    // Some data 
    private int counter; 
    private long timestamp; 

    public void write(DataOutput out) throws IOException { 
     out.writeInt(counter); 
     out.writeLong(timestamp); 
    } 

    public void readFields(DataInput in) throws IOException { 
     counter = in.readInt(); 
     timestamp = in.readLong(); 
    } 

    public int compareTo(MyWritableComparable other) { 
     int thisValue = this.counter; 
     int thatValue = other.counter; 
     return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1)); 
    } 
} 

而且應該是這樣的。這假定您正在使用Hadoop API的修訂0.20.20.21.0