2013-04-28 21 views
1

以下是我的代碼,它運行pigrunner和pigstats:abc.pig轉儲不pigrunner工作

A = load 'Courses' using PigStorage(' '); 
B = foreach A generate $0 as id; 
dump B; 

我得到正確的輸出,但它後面是此異常的

String[] args = {"abc.pig"}; 
    PigStats stats = PigRunner.run(args,null); 

    System.out.println("Stats : " + stats.getReturnCode()); 

    OutputStats os = stats.result("B"); 

    Iterator<Tuple> it = os.iterator(); 

    while(it.hasNext()){ 
     Tuple t = it.next(); 
     System.out.println(t.getAll()); 
    } 

內容Stacktrace與根本原因

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:54310/tmp/temp-221133443/tmp1478461116 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) 
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:154) 
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:116) 
    at org.apache.pig.tools.pigstats.OutputStats.iterator(OutputStats.java:148) 
    at org.apache.jsp.result_jsp._jspService(result_jsp.java:86) 
    at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) 
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) 
    at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:419) 
    at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:391) 
    at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334) 
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) 
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:304) 
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) 
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) 
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164) 
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462) 
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) 
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) 
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562) 
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) 
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:395) 
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:250) 
    at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188) 
    at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166) 
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) 
    at java.lang.Thread.run(Thread.java:662) 

現在,如果我用STORE替換DUMP,相同的代碼無錯誤地工作。

可以請一些解釋我是怎麼回事?

感謝 拉維

回答

3

在轉儲豬的情況下,存儲在臨時位置輸出,如:HDFS://本地主機的/ tmp/temp797130848/tmp1101984728 (在你工作的config看看pig.map.output.dirs。 XML)

PigRunner.run()電話GruntParser.processDump(String alias)在過程的某些點,這會遍歷結果元組並打印出來到控制檯:

Iterator<Tuple> result = mPigServer.openIterator(alias); 
while (result.hasNext()) 
{ 
    Tuple t = result.next(); 
    System.out.println(TupleFormat.format(t)); 
} 

在此之後,但在返回之前,它還會調用FileLocalizer.deleteTempFiles()刪除此臨時目錄。

現在要返回別名的結果。 OutputStats的迭代器試圖再次打開臨時文件以遍歷元組,如PigRunner.run()之前做過的那樣。 但問題是這個文件不再存在,因此你得到異常。

所以我建議你,因爲你已經轉儲打印出來System.out.println("Stats : " + stats.getReturnCode());後刪除代碼。

+0

感謝偉大的解釋,在我來說,我傳遞一個PigProgressNotificationListener到PigRunner,當任務完成後,我拿到了包含輸出OutputStats對象解決了這個問題。 – 2013-04-30 19:47:13