2013-08-31 70 views
0

我正在使用豬來訪問cassandra中的column家族與counter列。當我嘗試轉儲數據我得到下面的錯誤:CASSANDRA + PIG + CQL + Counter Column錯誤

cqlsh:pollkan> CREATE TABLE votes_count_period_1 (
      ... period int, 
      ... poll text, 
      ... votes counter, 
      ... PRIMARY KEY (period, poll) 
      ...); 

cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 

cqlsh:pollkan> select * from votes_count_period_1; 

period | poll         | votes 
----------+--------------------------------------+------- 
20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a |  5 
20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a |  2 
20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a |  3 


[email protected]:/usr/share/cassandra# pig -x local 
2013-08-31 23:02:06,135 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38 
2013-08-31 23:02:06,136 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log 
2013-08-31 23:02:06,154 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found 
2013-08-31 23:02:06,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar 
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar 
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar 
grunt> A = LOAD 'cql://pollkan/votes_count_period_1' USING org.apache.cassandra.hadoop.pig.CqlStorage(); 
grunt> DUMP A; 

Causes: 

2013-08-31 23:01:35,397 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed ColumnFamilySplit((-69569900416187863, '-54603788994328078] @[cassandra001, cassandra002, cassandra003]) 
2013-08-31 23:01:35,417 [pool-4-thread-1] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 
2013-08-31 23:01:35,418 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[2,4] C: R: 
2013-08-31 23:01:35,424 [Thread-10] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete. 
2013-08-31 23:01:35,428 [Thread-10] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local712790083_0002 
java.lang.Exception: java.lang.IndexOutOfBoundsException 
     at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) 
Caused by: java.lang.IndexOutOfBoundsException 
     at java.nio.Buffer.checkIndex(Buffer.java:538) 
     at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410) 
     at org.apache.cassandra.db.context.CounterContext.total(CounterContext.java:477) 
     at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:34) 
     at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:25) 
     at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.columnToTuple(AbstractCassandraStorage.java:137) 
     at org.apache.cassandra.hadoop.pig.CqlStorage.getNext(CqlStorage.java:110) 
     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) 
     at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531) 
     at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) 
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
     at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) 
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:166) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:722) 

我讀https://issues.apache.org/jira/browse/CASSANDRA-5234與cql3表和計數器列解決的問題,但我STIL有問題。

順便說一句,我試圖重新建立與老款緊湊的存儲表,我公司擁有先進多一點,但一個新的問題與下面的錯誤stucked:

cqlsh:pollkan> CREATE TABLE votes_count_period_2 (
      ... period int, 
      ... poll text, 
      ... votes counter, 
      ... PRIMARY KEY (period, poll) 
      ...) WITH COMPACT STORAGE; 
cqlsh:pollkan> 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a'; 
cqlsh:pollkan> 
cqlsh:pollkan> select * from votes_count_period_2; 

period | poll         | votes 
----------+--------------------------------------+------- 
20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a |  5 
20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a |  2 
20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a |  3 

[email protected]:/usr/share/cassandra# pig -x local 
2013-08-31 23:02:06,135 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38 
2013-08-31 23:02:06,136 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log 
2013-08-31 23:02:06,154 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found 
2013-08-31 23:02:06,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar 
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar 
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar 
grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage(); 
grunt> DUMP A; 
2013-08-31 23:05:59,454 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 
2013-08-31 23:05:59,458 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 
2013-08-31 23:05:59,465 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 
2013-08-31 23:05:59,466 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 
((period,20130830),(poll,605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a),(votes,5)) 
((period,20130831),(poll,405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a),(votes,2)) 
((period,20130831),(poll,505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a),(votes,3)) 

grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage(); 
grunt> B = FOREACH A GENERATE poll, votes; 
grunt> describe B; 
B: {poll: chararray,votes: long} 
grunt> C = GROUP B BY poll; 
grunt> describe C; 
C: {group: chararray,B: {(poll: chararray,votes: long)}} 
grunt> D = FOREACH C GENERATE group AS pollgroup, SUM(B.votes); 
grunt> describe D; 
D: {pollgroup: chararray,long} 
grunt> dump D; 

2013-08-31 23:53:32,577 [pool-33-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[13,4],B[14,4],D[18,4],C[17,4] C: D[18,4],C[17,4] R: D[18,4] 
2013-08-31 23:53:32,586 [pool-33-thread-1] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output 
2013-08-31 23:53:32,589 [Thread-65] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete. 
2013-08-31 23:53:32,591 [Thread-65] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local814297309_0018 
java.lang.Exception: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String 
     at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) 
Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String 
     at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:76) 
     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:112) 
     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) 
     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) 
     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) 
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
     at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) 
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:166) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:722) 

我的版本是豬0.11.1和卡桑德拉1.2.9。

任何幫助?

感謝

回答

1

,我發現了同樣的問題今天早些時候在測試類似的數據結構的最新豬cql3整合。

您提到的JIRA問題https://issues.apache.org/jira/browse/CASSANDRA-5234確實包含已被驗證可用於其中一個評論者的補丁。然而,通過cassandra git快速瀏覽,發現它並沒有應用於1.2分支或主幹上。我已經爲JIRA問題添加了一個評論。

在修補程序被提交併發佈一個新的穩定版本之前,解決方案是在1.2.9的新簽出版本上應用該修補程序,重新編譯並將其部署到您的hadoop節點(如果這是您的選項)。

+0

我看了看src,並在1.2.9中提到了這個補丁。我可能已經忘記了一些事情。我也在幾個月內在JIRA中添加了評論。我將繼續關注JIRA的主題,謝謝! – marcostrama