0
我試圖將以下數據作爲Hadoop中的鍵值對讀取。如何在Hadoop中讀取並處理文件作爲keyvaluepair
name: "Clooney, George", release: "2013", movie: "Gravity",
name: "Pitt, Brad", release: "2004", movie: "Ocean's 12",
name: Clooney, George", release: "2004", movie: "Ocean's 12",
name: "Pitt, Brad", release: "1999", movie: "Fight Club"
我需要的輸出如下:
name: "Clooney, George", movie: "Gravity, Ocean's 12",
name: "Pitt, Brad", movie: "Ocean's 12, Fight Club",
我寫了一個映射器和減速如下:
public static class MyMapper
extends Mapper<Text, Text, Text, Text>{
private Text word = new Text();
public void map(Text key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString(),",");
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(key, word);
}
}
}
public static class MyReducer
extends Reducer<Text,Text,Text,Text> {
private Text result = new Text();
public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
String actors = "";
for (Text val : values) {
actors += val.toString();
}
result.set(actors);
context.write(key, result);
}
}
我還增加了如下配置的詳細信息:
Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");
我得到了f ollowing輸出:
name: "Clooney George" release: "2013" movie: "Gravity" George" release: "2004" movie: "Ocean's 12"
name: "Pitt Brad" release: "2004" movie: "Ocean's 12" Brad" release: "1999" movie: "Fight Club"
好像我甚至不能夠得到基本的鍵值對閱讀的權利。 Hadoop中的鍵值處理如何?有人能詳細說明這一點,並指出我要出錯的地方嗎?
謝謝。 TM
感謝您的回覆。但是TextInputFormat不是逐行讀取行嗎?是否可以使用它來單獨處理一條線的記錄? – visakh
還有一個問題 - 我如何顯示Mapper的輸出?我想看看Mapper如何處理記錄?我可以通過標準的Java命令做到這一點嗎? – visakh
@ user295338顯示映射器輸出或者禁用reducer,或者使用[MultipleOutputs](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html) 。使用MultipleOutputs將映射器數據發送到其他文件,然後將其發送到縮減器。這允許您在不禁用Reducer的情況下存儲映射器的數據。 –