2012-12-03 19 views
0

使用Cassandra 1.1.6,Pig 0.10.0和Hadoop 1.1.0,我可以成功運行examples/pig中提供的cassandra中的pig_cassandra示例腳本。Cassandra小豬示例在啓用寬行輸入時失敗

但是當我改變

rows = LOAD 'cassandra://PigTest/SomeApp' USING CassandraStorage(); 

到:

rows = LOAD 'cassandra://PigTest/SomeApp?widerows=true' USING CassandraStorage(); 

我收到以下錯誤:

java.lang.IndexOutOfBoundsException: Index: 8, Size: 2 
    at java.util.ArrayList.rangeCheck(ArrayList.java:604) 
    at java.util.ArrayList.get(ArrayList.java:382) 
    at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:156) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:579) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNext(POPreCombinerLocalRearrange.java:126) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) 
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)                                                                           
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 

在無論是在本地和MapReduce模式下運行時出現這種情況,或如果我設置PIG_WIDEROW_INPUT = true。

以下豬拉丁文腳本將失敗,並顯示「widerows = true」參數。

rows = LOAD 'cassandra://PigTest/SomeApp?widerows=true' USING CassandraStorage(); 
cols = FOREACH rows GENERATE flatten(columns.name); 
DUMP cols; 

我似乎無法超越此,使用寬行輸入時不讀取SomeApp列家族中的靜態列。其他列家族也存在同樣的問題。

回答

0

我有類似的問題。這可能是由於get_paged_slices中的bug在1.1.x以後的版本中修復的。該解決方案將是卡桑德拉昇級到1.1.8 1.1.9

參見:

+0

謝謝Justen ,這個問題似乎與卡桑德拉1.1.8一直存在。 – Rob

+0

@Rob - 1.1.8中有另一個bug,CassandraStorage可能會在1.1.9/1.2.1中修復。 [CASSANDRA-5098](https://issues.apache.org/jira/browse/CASSANDRA-5098)。 – Justen