節省JSON數據在HDFS Hadoop中

我有以下的減速類節省JSON數據在HDFS Hadoop中

public static class TokenCounterReducer extends Reducer<Text, Text, Text, Text> { 
    public void reduce(Text key, Iterable<Text> values, Context context) 
      throws IOException, InterruptedException { 

     JSONObject jsn = new JSONObject(); 

     for (Text value : values) { 
      String[] vals = value.toString().split("\t"); 
      String[] targetNodes = vals[0].toString().split(",",-1); 
      jsn.put("source",vals[1]); 
      jsn.put("target",targetNodes); 

     } 
     // context.write(key, new Text(sum)); 
    } 
}

去通的例子（免責聲明：新手在這裏），我可以看到，一般的輸出類型似乎像一個鍵/值存儲。

但是如果我在輸出中沒有任何鍵，會怎麼樣？或者如果我想如果我的輸出是以其他格式（我的情況下是json）呢？

反正從上面的代碼：我想寫對象到HDFS json？

這在Hadoop流中非常瑣碎..但是我如何在Hadoop java中做到這一點？

來源

2013-06-04 Fraz

如果你只想寫JSON對象HDFS的列表，而無需關心鍵/值的概念，你可以只在您的Reducer產值使用NullWritable：

public static class TokenCounterReducer extends Reducer<Text, Text, Text, NullWritable> { 
    public void reduce(Text key, Iterable<Text> values, Context context) 
      throws IOException, InterruptedException { 
     for (Text value : values) { 
      JSONObject jsn = new JSONObject(); 
      .... 
      context.write(new Text(jsn.toString()), null); 
     } 
    } 
}

請注意，您將需要改變你的工作配置做：

job.setOutputValueClass(NullWritable.class);

通過編寫JSON對象HDFS我明白，要存儲這些我所描述的JSON的字符串表示以上。如果您想將JSON的二進制表示存儲到HDFS中，則需要使用SequenceFile。很明顯，你可以爲此編寫自己的Writable，但是如果你打算使用簡單的字符串表示法，我覺得它更簡單。

來源

2013-06-04 19:19:10

嗨@Charles我是新來的Hadoop一旦我們存儲在HDFS我們如何檢索數據，如果我們不應用任何鍵值概念JSON數據文件。 – u449355

您可以使用Hadoop的OutputFormat接口來創建自定義格式，它將根據您的意願寫入數據。例如，如果你需要數據被寫爲JSON對象，那麼你可以這樣做：

public class JsonOutputFormat extends TextOutputFormat<Text, IntWritable> { 
    @Override 
    public RecordWriter<Text, IntWritable> getRecordWriter(
      TaskAttemptContext context) throws IOException, 
        InterruptedException { 
     Configuration conf = context.getConfiguration(); 
     Path path = getOutputPath(context); 
     FileSystem fs = path.getFileSystem(conf); 
     FSDataOutputStream out = 
       fs.create(new Path(path,context.getJobName())); 
     return new JsonRecordWriter(out); 
    } 

    private static class JsonRecordWriter extends 
      LineRecordWriter<Text,IntWritable>{ 
     boolean firstRecord = true; 
     @Override 
     public synchronized void close(TaskAttemptContext context) 
       throws IOException { 
      out.writeChar('{'); 
      super.close(null); 
     } 

     @Override 
     public synchronized void write(Text key, IntWritable value) 
       throws IOException { 
      if (!firstRecord){ 
       out.writeChars(",\r\n"); 
       firstRecord = false; 
      } 
      out.writeChars("\"" + key.toString() + "\":\""+ 
        value.toString()+"\""); 
     } 

     public JsonRecordWriter(DataOutputStream out) 
       throws IOException{ 
      super(out); 
      out.writeChar('}'); 
     } 
    } 
}

如果你不想在你的輸出的關鍵只是發出空，如：

context.write(NullWritable.get(), new IntWritable(sum));

HTH

來源

2013-06-04 19:17:58 Tariq

節省JSON數據在HDFS Hadoop中

回答

相關問題