我試圖結合Hadoop,Pig和Cassandra來通過簡單的Pig查詢來處理Cassandra中存儲的數據。問題是我無法讓Pig創建實際與CassandraStorage配合使用的Map/Reduce作業。通過Pig提交地圖/縮小作業時捆綁罐子?
我所做的是從contrib/pig(Cassandra的源代碼發行版)頂部的一個集羣機器上覆制了storage-conf.xml文件,然後將其編譯到cassandra_loadfun.jar文件中。
接下來,我適應的例子,script.pig包括所有的罐子:
register /opt/pig/pig-0.7.0-core.jar;
register /tmp/apache-cassandra-0.6.3-src/lib/libthrift-r917130.jar;
REGISTER /tmp/apache-cassandra-0.6.3-src/contrib/pig/build/cassandra_loadfunc.jar;
rows = LOAD 'cassandra://Keyspace1/Standard1' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
cols = FOREACH rows GENERATE flatten($1);
colnames = FOREACH cols GENERATE $0;
namegroups = GROUP colnames BY $0;
namecounts = FOREACH namegroups GENERATE COUNT($1), group;
orderednames = ORDER namecounts BY $0;
topnames = LIMIT orderednames 50;
dump topnames;
所以,如果我沒有記錯的罐子應該捆綁到提交到Hadoop的工作。 但在運行作業時,它只是拋出了我的異常:
2010-08-04 22:11:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching map reduce job.
2010-08-04 22:11:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias topnames
at org.apache.pig.PigServer.openIterator(PigServer.java:521)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:391)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias topnames
at org.apache.pig.PigServer.store(PigServer.java:577)
at org.apache.pig.PigServer.openIterator(PigServer.java:504)
... 6 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
at org.apache.pig.PigServer.store(PigServer.java:569)
... 7 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.NoClassDefFoundError: org/apache/thrift/TBase
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
我不理解,因爲節儉庫明確列出,並應捆綁在一起,不是嗎?
對於在尋找[錯誤1066:無法打開別名的迭代器]時發現此帖子的人(http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for- alias-in-pig-generic-solution)這裏是一個[通用解決方案](http://stackoverflow.com/a/34495086/983722)。 – 2015-12-28 14:43:07