2015-04-03 56 views
3

我一直被困在這個問題上好幾天。所以任何幫助將不勝感激。將Cassandra表複製到Hive

我正在嘗試製作cassandra表的副本以配置單元(以便我可以將它放入配置單元Metastore中,然後從Tableau訪問它)。 Hive - > Tableau部分工作,但不是Cassandra到Hive部分。數據未被複制到Hive Metastore。

下面是我所採取的步驟:https://github.com/tuplejump/cash/tree/master/cassandra-handler

我產生蜂房cassandra-:

我跟着從這個項目的自述文件的說明。 .jar,將它複製到cassandra-all- .jar,cassandra-thrift - *。jar到配置單元的lib文件夾。

然後我開始蜂房,試過如下:

hive> add jar /usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar; 
Added [/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar] to class path 
Added resources: [/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar] 
hive> list jars; 
/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar 
hive> create temporary function tmp as 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler' 
    > ; 
FAILED: Class org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler not found 

我不知道爲什麼蜂巢看不到CqlStorageHandler ...

謝謝!

回答

1

另一種可以考慮的方法是編寫一個簡單的java程序,將數據寫入文件,然後將其加載到配置單元中。

package com.company.cassandra; 

import com.datastax.driver.core.Cluster; 
import com.datastax.driver.core.Cluster.Builder; 
import com.datastax.driver.core.ResultSet; 
import com.datastax.driver.core.ResultSetFuture; 
import com.datastax.driver.core.Row; 
import com.datastax.driver.core.Session; 

public class CassandraExport { 

    public static Session session; 


    public static void connect(String username, String password, String host, int port, String keyspace) { 
     Builder builder = Cluster.builder().addContactPoint(host); 
     builder.withPort(port); 
     if (username != null && password != null) { 
      builder.withCredentials(username, password); 
     } 

     Cluster cluster = builder.build(); 
     session = cluster.connect(keyspace); 
    } 

    public static void main(String[] args) { 
     //Prod 
     connect("user", "password", "server", 9042, "keyspace"); 

     ResultSetFuture future = session.executeAsync("SELECT * FROM table;"); 
     ResultSet results = future.getUninterruptibly(); 
     for (Row row : results) { 
      //Print the columns in the following order 
      String out = row.getString("col1") + "\t" + 
          String.valueOf(row.getInt("col2")) + "\t" + 
          String.valueOf(row.getLong("col3")) + "\t" + 
          String.valueOf(row.getLong("col4")); 
      System.out.println(out); 
     } 

     session.close(); 
     session.getCluster().close(); 
    } 


} 

將輸出寫入文件,然後加載到配置單元。

hive -e "use schema; load data local inpath '/tmp/cassandra-table' overwrite into table mytable;"