0
我面臨以下情況。請幫助我。我使用hadoop Mapreduce來處理XML文件。作爲hadoop中的單個輸入格式
通過闖民宅本網站即時通訊能夠slipt我的記錄https://gist.github.com/sritchie/808035 但是當XML文件的大小大於塊大小IM沒有得到應有的價值 大,所以我需要讀取整個文件 對於我得到這個鏈接
https://github.com/pyongjoo/MapReduce-Example/blob/master/mysrc/XmlInputFormat.java
但現在的問題是如何實現兩個inputformat作爲一個單一的inputformat
請幫助我很快 感謝
UPDATE
public class XmlParser11
{
public static class XmlInputFormat1 extends TextInputFormat {
public static final String START_TAG_KEY = "xmlinput.start";
public static final String END_TAG_KEY = "xmlinput.end";
@Override
protected boolean isSplitable(JobContext context, Path file) {
return false;
}
public RecordReader<LongWritable, Text> createRecordReader(InputSplit split, TaskAttemptContext context) {
return new XmlRecordReader();
}
/**
* XMLRecordReader class to read through a given xml document to output
* xml blocks as records as specified by the start tag and end tag
*
*/
public static class XmlRecordReader extends RecordReader<LongWritable, Text> {
private byte[] startTag;
private byte[] endTag;
private long start;
private long end;
private FSDataInputStream fsin;
private DataOutputBuffer buffer = new DataOutputBuffer();
private LongWritable key = new LongWritable();
private Text value = new Text();
@Override
public void initialize(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
startTag = conf.get(START_TAG_KEY).getBytes("utf-8");
endTag = conf.get(END_TAG_KEY).getBytes("utf-8");
FileSplit fileSplit = (FileSplit) split;
但不工作
但我們需要寫RecordReader權。我有一個用於xml閱讀器的RecordReader,那麼我怎樣才能將整個文件閱讀器合併到它 – Backtrack
請看看這篇文章。我編輯過它。 –
+1。我已更新帖子看看 – Backtrack