1
reducer(具有Text鍵和Iterable MapWritable值)如何將其所有Map的序列輸出到序列文件以便保留其關鍵字上的分組?例如,假設映射器發送記錄減速看起來像:hadoop序列文件集合
<"dog", {<"name", "Fido">, <"pure bred?", "false">, <"type", "mutt">}>
<"cat", {<"name", "Felix">, <"color", "black">, <"origin", "film">, <"date", "1919">}>
<"dog", {<"name", "Lassie">, <"type", "collie">, <"origin", " short story">}>
我想序列文件寫爲:
key = "dog"
value = {
{<"name", "Fido">, <"pure bred?", "false">, <"type", "mutt">},
{<"name", "Lassie">, <"type", "collie">, <"origin", "short story">}
}
key = "cat"
value = {
{<"name", "Felix">, <"color", "black">, <"origin", "film">, <"date", "1919">}
}
我猜我需要創建一個自定義值輸出類,實現Writable,但我不知道如何做到這一點,因爲集合並不真正與序列文件一起工作,據我所知。我希望這樣做,以便下一個地圖/縮減階段將在與每個鍵相關的所有地圖中讀取爲一個單元。
TIA,