2014-11-25 119 views
0

我想創建豬UDF自定義負載功能,我一直在使用的鏈接SimpleTextLoader UDF豬

https://pig.apache.org/docs/r0.11.0/udf.html創建SimpleTextLoader,我已經成功地生成此代碼的jar文件,在豬註冊並運行豬腳本。我得到空輸出。我不知道如何解決這個問題,任何幫助將不勝感激。

下面是我的Java代碼

public class SimpleTextLoader extends LoadFunc{ 
    protected RecordReader in = null; 
    private byte fieldDel = '\t'; 
    private ArrayList<Object> mProtoTuple = null; 
    private TupleFactory mTupleFactory = TupleFactory.getInstance(); 
    private static final int BUFFER_SIZE = 1024; 
     public SimpleTextLoader() { 
    } 

public SimpleTextLoader(String delimiter) 
{ 
     this(); 
     if (delimiter.length() == 1) { 
      this.fieldDel = (byte)delimiter.charAt(0); 
     } else if (delimiter.length() > 1 && delimiter.charAt(0) == '\\') { 
      switch (delimiter.charAt(1)) { 
      case 't': 
       this.fieldDel = (byte)'\t'; 
       break; 

      case 'x': 
       fieldDel = 
        Integer.valueOf(delimiter.substring(2), 16).byteValue(); 
       break; 

      case 'u': 
       this.fieldDel = 
        Integer.valueOf(delimiter.substring(2)).byteValue(); 
       break; 

      default: 
       throw new RuntimeException("Unknown delimiter " + delimiter); 
      } 
     } else { 
      throw new RuntimeException("PigStorage delimeter must be a single character"); 
     } 
    } 

private void readField(byte[] buf, int start, int end) { 
     if (mProtoTuple == null) { 
      mProtoTuple = new ArrayList<Object>(); 
     } 

     if (start == end) { 
      // NULL value 
      mProtoTuple.add(null); 
     } else { 
      mProtoTuple.add(new DataByteArray(buf, start, end)); 
     } 
    } @Override 
    public Tuple getNext() throws IOException { 
     try { 
      boolean notDone = in.nextKeyValue(); 
      if (notDone) { 
       return null; 
      } 
      Text value = (Text) in.getCurrentValue(); 
      System.out.println("printing value" +value); 
      byte[] buf = value.getBytes(); 
      int len = value.getLength(); 
      int start = 0; 

      for (int i = 0; i < len; i++) { 
       if (buf[i] == fieldDel) { 
        readField(buf, start, i); 
        start = i + 1; 
       } 
      } 
      // pick up the last field 
      readField(buf, start, len); 

      Tuple t = mTupleFactory.newTupleNoCopy(mProtoTuple); 
      mProtoTuple = null; 
      System.out.println(t); 
      return t; 
     } catch (InterruptedException e) { 
      int errCode = 6018; 
      String errMsg = "Error while reading input"; 
      throw new ExecException(errMsg, errCode, 
        PigException.REMOTE_ENVIRONMENT, e); 
     } 

    } 


    @Override 
    public void setLocation(String string, Job job) throws IOException { 
     FileInputFormat.setInputPaths(job,string); 
    } 

    @Override 
    public InputFormat getInputFormat() throws IOException { 
     return new TextInputFormat(); 
    } 

    @Override 
    public void prepareToRead(RecordReader reader, PigSplit ps) throws IOException { 
     in=reader; 
    } 

} 

下面是我的豬腳本

REGISTER /home/hadoop/netbeans/sampleloader/dist/sampleloader.jar 
a= load '/input.txt' using sampleloader.SimpleTextLoader(); 
store a into 'output'; 
+0

您是否收到任何錯誤日誌? – MarHserus 2014-11-25 12:04:21

+0

沒有得到任何錯誤日誌,作業成功執行但創建空文件。 – Mallieswari 2014-11-26 06:25:10

回答

1

您正在使用sampleloader.SimpleTextLoader()不會做任何事情,因爲它只是一個空的構造。
取而代之的是使用sampleloader.SimpleTextLoader(String delimiter)這是執行split的實際操作。

+0

嗨感謝您的回覆,我也試過用分隔符,我得到空輸出。 – Mallieswari 2014-11-26 04:33:26

+0

請參考此鏈接逐步加載數據使用豬自定義加載函數[** Here **](http://shrikantbang.wordpress.com/2013/11/02/apache-pig-custom-load-function -2 /) – 2014-11-26 05:07:32