2012-02-24 40 views
0

我無法使用ElephantBird和Pig讀取JSON文件。我想知道我犯了什麼錯誤。使用elephantbird讀取JSON時出錯 - 豬

數據:

{ "nrcpts": "1", 
    "src": "[email protected]", 
    "sendmailid": "p6D0r0u1006229", 
    "relay": "app03.example.com", 
    "classnumber": "0", 
    "msgid": "WARQZCXAEMSSVWPPOOYZXR 
[email protected]", 
    "pid": "6229", 
    "month": "Jul", 
    "time": "20:53:00", 
    "day": "12", 
    "mailserver": "mail5", 
    "size": "57395" 
} 

代碼:

json1 = load '/user/hdetl/funnel/uetsample.dat' using com.twitter.elephantbird.pig.load.JsonLoader(); 

dat = FOREACH json1 GENERATE $0#'mailserver' AS mailserver; 
dump dat; 

錯誤:

Input(s): 
Failed to read data from "/user/hdetl/funnel/uetsample.dat" 

detailed error : 
Pig Stack Trace 
--------------- 
ERROR 2997: Unable to recreate exception from backed error: Error: in 

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias dat. Backend error : Unable to recreate exception from back 
ed error: Error: in 
     at org.apache.pig.PigServer.openIterator(PigServer.java:891) 
     at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655) 
     at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303) 
     at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) 
     at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) 
     at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) 
     at org.apache.pig.Main.run(Main.java:495) 
     at org.apache.pig.Main.main(Main.java:111) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
     at java.lang.reflect.Method.invoke(Method.java:597) 
     at org.apache.hadoop.util.RunJar.main(RunJar.java:186) 
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: Error: in 
     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221) 
     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151) 
     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:344) 
     at org.apache.pig.PigServer.launchPlan(PigServer.java:1314) 
     at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299) 
     at org.apache.pig.PigServer.storeEx(PigServer.java:996) 
     at org.apache.pig.PigServer.store(PigServer.java:963) 
     at org.apache.pig.PigServer.openIterator(PigServer.java:876) 
+0

出口JAVA_HOME =的/ usr/JAVA/jdk1.6.0_22 出口PIG_CLASSPATH =的/ etc/Hadoop的/ conf目錄 出口PATH = $ PATH:/local/hdetl/pig-0.9.2/bin REGISTER /local/hdetl/funnel/pig-jars/json-simple-1.1.jar; 註冊/local/hdetl/funnel/pig-jars/google-collect-1.0.jar; REGISTER'/local/hdetl/funnel/pig-jars/elephant-bird-1.2.1-SNAPSHOT.jar'; – Bharathi 2012-02-24 22:25:59

+0

你的問題是什麼? – 2012-02-24 22:44:57

+0

我不能使用elephantbird和PIG來讀取JSON文件。我想知道在哪裏犯錯誤。 – Bharathi 2012-02-24 22:55:03

回答

0

我沒有使用過的JSON裝載機但我想你應該能夠降在你的foreach中$ 0。我只是認爲裝載程序只是將{和}之間的所有內容變成單個記錄(Tuple)。

 
dat = FOREACH json1 GENERATE mailserver; 
0

相當舊的帖子,但有人可能有類似的問題。

我從問題中提供的數據創建了輸入文件。
我無法加載,因爲不必要的泰德文件輸入行:

"msgid": "WARQZCXAEMSSVWPPOOYZXR 
[email protected]", 

但固定是沒有得到預期的結果。 我已經從文件中刪除了所有輸入,所以最終我只有一行。

文件加載:

dump json1 
([time#20:53:00,msgid#WARQZCXAEMSSVWPPOOYZXRLQIKMFU[email protected],relay#app03.example.com,mailserver#mail5,month#Jul,pid#6229,classnumber#0,day#12,src#[email protected],sendmailid#p6D0r0u1006229,nrcpts#1,size#57395]) 

,你foreach作品:

dat = FOREACH json1 GENERATE $0#'mailserver' AS mailserver; 
dump dat 

(mail5)