2013-11-25 32 views
1

reducer(具有Text鍵和Iterable MapWritable值)如何將其所有Map的序列輸出到序列文件以便保留其關鍵字上的分組?例如,假設映射器發送記錄減速看起來像:hadoop序列文件集合

<"dog", {<"name", "Fido">, <"pure bred?", "false">, <"type", "mutt">}> 
<"cat", {<"name", "Felix">, <"color", "black">, <"origin", "film">, <"date", "1919">}> 
<"dog", {<"name", "Lassie">, <"type", "collie">, <"origin", " short story">}> 

我想序列文件寫爲:

key = "dog" 
value = { 
      {<"name", "Fido">, <"pure bred?", "false">, <"type", "mutt">}, 
      {<"name", "Lassie">, <"type", "collie">, <"origin", "short story">} 
     } 

key = "cat" 
value = { 
      {<"name", "Felix">, <"color", "black">, <"origin", "film">, <"date", "1919">} 
     } 

我猜我需要創建一個自定義值輸出類,實現Writable,但我不知道如何做到這一點,因爲集合並不真正與序列文件一起工作,據我所知。我希望這樣做,以便下一個地圖/縮減階段將在與每個鍵相關的所有地圖中讀取爲一個單元。

TIA,

回答

0

當你注意,你可以創建一個自定義可寫擴展ArrayWritable

public class MapWritableArray extends ArrayWritable { 
    public MapWritableArray() { 
     super(MapWritable.class); 
    } 
} 

然後在你的減速,您需要MapWritable值的迭代積累到一個數組(記住隨着底層內容隨着每次迭代而改變,複製這些值)。類似(完全未經測試,未經編譯驗證且未優化):

MapWritableArray mapWritableArray = new MapWritableArray(); 
ArrayList<MapWritable> valList = new ArrayList<MapWritable>(); 
for (MapWritable value : values) { 
    MapWritable copy = ReflectionUtils.newInstance(context.getConfiguration(), MapWritable.class); 
    ReflectionUtils.copy(context.getConfiguration, value, copy); 
    valList.add(copy); 
} 
mapWritableArray.set(valList.toArray(new MapWritable[0]));