2016-12-16 36 views
1

我想安裝豬的單元測試,我正在查看他們提供的文檔。這似乎有點過時,所以我切換到svn主幹。第一個奇怪的是,實際上它需要更多的庫,不僅需要pigunit,pig和hadoop-commons才能工作(添加hadoop-hdfs,hadoop-mapreduce-client-core,hadoop-mapreduce-client-jobclient)。我不確定在依賴管理器中有這些東西是否合適,但這不是主要問題。因此,這裏是我試圖執行測試:運行pigunit樣本測試的奇怪錯誤

@Test 
public void testNtoN() throws ParseException, IOException { 
    String[] args = { 
        "n=3", 
        "reducers=1", 
        "input=top_queries_input_data.txt", 
        "output=top_3_queries", 
    }; 
    test = new PigTest("script dir", args); 

    String[] output = { 
        "(yahoo,25)", 
        "(facebook,15)", 
        "(twitter,7)", 
    }; 

    test.assertOutput("queries_limit", output); 
} 

,這裏是實際的腳本:

data = 
    LOAD '$input' 
    AS (query:CHARARRAY, count:INT); 

queries_group = 
    GROUP data 
    BY query 
    PARALLEL $reducers; 

queries_sum = 
    FOREACH queries_group 
    GENERATE 
     group AS query, 
     SUM(data.count) AS count; 

queries_ordered = 
    ORDER queries_sum 
    BY count DESC 
    PARALLEL $reducers; 

queries_limit = LIMIT queries_ordered $n; 

STORE queries_limit INTO '$output'; 

這裏的堆棧跟蹤:

STORE queries_limit INTO 'top_3_queries'; 
--> none 

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias queries_limit 

at org.apache.pig.PigServer.openIterator(PigServer.java:1019) 
at org.apache.pig.pigunit.PigTest.getAliasFromCache(PigTest.java:224) 
at org.apache.pig.pigunit.PigTest.getActualResults(PigTest.java:319) 
at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:409) 
at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:400) 
at BlaUnitTest.testBla(BlaUnitTest.java:24) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) 
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) 
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) 
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) 
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) 
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) 
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) 
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) 
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) 
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) 
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) 
at org.junit.runners.ParentRunner.run(ParentRunner.java:309) 
at org.mockito.internal.runners.JUnit45AndHigherRunnerImpl.run(JUnit45AndHigherRunnerImpl.java:37) 
at org.mockito.runners.MockitoJUnitRunner.run(MockitoJUnitRunner.java:62) 
at org.junit.runner.JUnitCore.run(JUnitCore.java:160) 
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:117) 
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42) 
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:262) 
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:84) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) 
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) 
Caused by: java.io.IOException: Couldn't retrieve job. 
at org.apache.pig.PigServer.store(PigServer.java:1083) 
at org.apache.pig.PigServer.openIterator(PigServer.java:994) 
... 34 more 

我試圖調試它,看看有什麼實際上發生了這種情況,當它試圖構建查詢計劃並獲取ExecJob時,卻發生了這種情況。我甚至試圖簡化腳本並刪除一切,但加載和存儲數據的代碼。結果是一樣的。

+0

Pig在默認情況下使用PigStorage讀取,\ t是使用的默認分隔符。你確定top_queries_input_data.txt是相應的分隔嗎? WRT你的問題包括hadoop-mapreduce- *庫,你可以包括他們只爲測試。如果您使用gradle進行依賴性管理,則可以使用testCompile。我相信有一種方法可以在maven和其他依賴管理中做類似的操作。 – coder

+0

是的,我在svn中使用了確切的數據(http://svn.apache.org/viewvc/pig/trunk/test/data/pigunit/)。我認爲這個問題可能與PigServer的某些配置有關(我在LOCAL模式下運行它,但它可能有一些額外的配置),或者它可能與操作系統有關(我使用的是Ubuntu 14.04) –

+0

是嗎?給你任何可以幫助縮小問題的堆棧跟蹤?此外,這是所有內存,所以我懷疑是否需要任何配置 – coder

回答

1

我成功解決了這個問題。問題是我已經在類路徑中包含了一些依賴關係,這似乎妨礙了正確的執行。唯一需要的依賴項是hadoop-core(我正在使用hadoop-aws,因爲我正在使用它),hadoop-client,pig和pigunit。所以現在一切正常運行。