2015-01-16 45 views
0

當我在腳本中指定大型目錄樹的根作爲LOAD輸入時,Pig會神祕地失敗。它引發的後端錯誤異常無法洞察發生了什麼。當文件較少時,相同的腳本完美地工作。我可以一次提交一份豬工作多少個文件?

這是一個非常簡單的腳本,你可以看到如下:

SET pig.noSplitCombination true; 
raw_record = LOAD '/data/directory/tree/root' USING PigStorage(','); 
filtered = FILTER raw_record by $1 == 251068; 
filtered_data = FOREACH filtered GENERATE (chararray)$0, (chararray)$1, (chararray)$2; 
STORE filtered_data INTO '/data/output/directory/' USING PigStorage(); 

這裏的錯誤消息我看到:

ERROR 2244: Job scope-594 failed, hadoop does not return any error message 
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job scope-594 failed, hadoop does not return any error message 
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:178) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:232) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203) 
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) 
    at org.apache.pig.Main.run(Main.java:608) 
    at org.apache.pig.Main.main(Main.java:156) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 

多少個文件可以PIG過程一次?

+0

好像它已經失敗的前端。你能在作業服務器訪問作業設置?你爲什麼要設置pig.noSplitCombination? – LiMuBei

回答

0

豬可以處理任意數量的文件,豬在處理方面沒有限制。在你的情況下,嘗試在加載時爲每個字段提供數據類型,並在FILTER語句中使用引號嘗試。

raw_record = LOAD '/數據/目錄/樹/根' USING PigStorage( '')作爲(COL1:chararray,COL2:chararray;

過濾= FILTER $ 1 == '251068' raw_record;

如果你仍然得到錯誤,儘量提供樣本數據

相關問題