2015-02-07 57 views
2

轉換JSON數據轉化爲具體的表格式我有JSON文件已採用以下格式:使用豬

"Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}] 
"Properties2":[{"K":"A","T":"String","V":"W」"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}] 

我想提取表格式的數據從上面用豬提到的JSON格式:

預期格式: enter image description here

注意: - 在第一條記錄中,C列應該爲空或爲空,因爲在第一條記錄中C列沒有值。

我試着用jsonloader和eliphantbird jar但沒有得到預期的輸出,請建議我任何適當的方法來獲得預期的輸出。

回答

1

你可以試試這個自定義UDF嗎?

樣品INPUT1:
input.json

{"Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}]} 
{"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]} 

PigScript:

REGISTER jsonparse.jar 
A= LOAD 'input.json' Using JsonLoader('Properties2:{(K:chararray,T:chararray,V:chararray)}'); 
B= FOREACH A GENERATE FLATTEN(STRSPLIT(mypackage.JSONPARSE(BagToString(Properties2)),'_',4)); 
STORE B INTO 'output' USING PigStorage(); 

輸出:

M  N    O 
W  X  Y  Z 

樣品輸入2:

{"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]} 
{"Properties2":[{"K":"A","T":"String","V":"M"},{"K":"B","T":"String","V":"N"},{"K":"D","T":"String","V":"O"}]} 
{"Properties2":[{"K":"A","T":"String","V":"J"}]} 
{"Properties2":[{"K":"B","T":"String","V":"X"}]} 
{"Properties2":[{"K":"C","T":"String","V":"Y"}]} 
{"Properties2":[{"K":"D","T":"String","V":"Z"}]} 

輸出2:

W  X  Y  Z 
M  N    O 
J 
     X 
       Y 
         Z 

UDF代碼:下面的java文件的被編譯和作爲jsonparse.jar產生(這只是一個暫時的Java代碼,你可以根據你的需要進行優化或修改)

JSONPARSE.java

package mypackage; 
    import java.io.IOException; 
    import org.apache.pig.EvalFunc; 
    import org.apache.pig.data.Tuple; 
    import java.util.LinkedHashMap; 
    import org.apache.commons.lang.StringUtils; 

    public class JSONPARSE extends EvalFunc<String> { 
    @Override 
    public String exec(Tuple arg0) throws IOException { 
    try 
     { 
      //Get the input 
      String input = ((String) arg0.get(0)); 

      //Parse the input "_" as the delimiter 
      String[] parts = input.split("_"); 

      //Init the hash with key as(A,B,C,D) and value as empty string 
      LinkedHashMap<String,String> mymap= new LinkedHashMap<String,String>(); 
      mymap.put("A", ""); 
      mymap.put("B", ""); 
      mymap.put("C", ""); 
      mymap.put("D", ""); 
      for(int i=0,j=2;i<parts.length;i=i+3,j=j+3) 
      { 
       //Find each key from the input and update the respective value 
       if(mymap.containsKey(parts[i])) 
       { 
        mymap.put(parts[i],parts[j]); 
       } 
      } 

      //Final output. 
      String output=""; 
      for(String key: mymap.keySet()) 
      { 
       //append each output "_" as delimiter 
       output=output+(String)mymap.get(key)+"_"; 
      } 

      //Remove the extra delimiter "_" from the output 
      return StringUtils.removeEnd(output,"_"); 
     } 
     catch(Exception e) 
     { 
       throw new IOException("Caught exception while processing the input row ", e); 
     } 
    } 
    } 

如何編譯和構建jar文件:

$ ls 
    JSONPARSE.java input.json 
$ javac JSONPARSE.java 
$ mkdir mypackage 
$ mv JSONPARSE.class mypackage/ 
$ jar -cvf jsonparse.jar mypackage/ 
$ ls 
    JSONPARSE.java input.json jsonparse.jar mypackage 

1.Download 2 jar files from the below link(apache-commons-lang.jar,piggybank.jar) 
    http://www.java2s.com/Code/Jar/a/Downloadapachecommonslangjar.htm 
    http://www.java2s.com/Code/Jar/p/Downloadpiggybankjar.htm 

2. Set the above 2 jar files to your class path 
    >> export CLASSPATH=/tmp/piggybank.jar:/tmp/apache-commons-lang.jar 

3. Create directory name mypackage 
    >>mkdir mypackage 

4. Compile your JSONPARSE.java file (make sure the two jars are included in the classpath otherwise compilation issue will come) 
    >>javac JSONPARSE.java 

5. Move the class file to mypackage folder 
    >>mv JSONPARSE.class mypackage/ 

6. Create jar file name jsonparse.jar 
    >>jar -cvf jsonparse.jar mypackage/ 

7. (jsonparse.jar) file will be created, include into your pig script using REGISTER command. 

從命令行實施例