2017-03-20 123 views
0

我試圖使用spark與hbase進行通信。 I'm使用下面這段代碼:未找到HBase異常類的Spark(JAVA)

SparkConf sparkConf = new SparkConf().setAppName("HBaseRead"); 
JavaSparkContext jsc = new JavaSparkContext(sparkConf); 
Configuration conf = HBaseConfiguration.create(); 
conf.addResource(new Path("/etc/hbase/conf/core-site.xml")); 
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml")); 
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf); 

Scan scan = new Scan(); 
scan.setCaching(100); 

JavaRDD<Tuple2<ImmutableBytesWritable, Result>> hbaseRdd = hbaseContext.hbaseRDD(TableName.valueOf("climate"), scan); 

System.out.println("Number of Records found : " + hbaseRdd.count()); 

如果我執行,我得到了以下錯誤:

Exception in thread "dag-scheduler-event-loop" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/StoreFileWriter 
    at java.lang.Class.getDeclaredMethods0(Native Method) 
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) 
    at java.lang.Class.getDeclaredMethod(Class.java:2128) 
    at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475) 
    at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72) 
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498) 
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472) 
    at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) 
    ... 

我沒有找到通過谷歌的任何解決方案。有沒有人有想法?

--------編輯--------

我正在使用maven。我出海的樣子:

<dependencies> 
    <dependency> 
     <groupId>org.apache.hbase</groupId> 
     <artifactId>hbase-server</artifactId> 
     <version>1.3.0</version> 
    </dependency>   

    <dependency> 
     <groupId>org.sharegov</groupId> 
     <artifactId>mjson</artifactId> 
     <version>1.4.1</version> 
    </dependency> 

    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.10</artifactId> 
     <version>1.5.2</version> 
    </dependency> 

    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-sql_2.10</artifactId> 
     <version>1.5.2</version> 
    </dependency> 

    <dependency> 
     <groupId>com.databricks</groupId> 
     <artifactId>spark-csv_2.10</artifactId> 
     <version>1.5.0</version> 
    </dependency> 

    <dependency> 
     <groupId>com.databricks</groupId> 
     <artifactId>spark-xml_2.10</artifactId> 
     <version>0.3.5</version> 
    </dependency>   

    <dependency> 
     <groupId>org.apache.hbase</groupId> 
     <artifactId>hbase-spark</artifactId> 
     <version>2.0.0-SNAPSHOT</version>        
    </dependency> 

</dependencies> 

林建設我使用的maven-assembly-plugin

回答

0

你所得到的NoClassDefFoundError依賴應用程序,因爲火花是無法找到HBase的在classpath罐子,您需要提供需要的jar到​​明確使用--jars參數,而開展的工作:

${SPARK_HOME}/bin/spark-submit \ 
--jars ${..add hbase jars comma separated...} 
--class .... 
......... 
+0

林用Maven構建火花應用程序(具有依賴不應該在每個需要的圖書館裏?我需要哪些Hbase庫?我很困惑。我必須添加完整的hbase庫(https://mvnrepository.com/artifact/org.apache.hbase/hbase/1.3.0)嗎?爲什麼? – monti

+0

不,maven不會爲你做,直到你使用一些額外的maven插件(maven-assembly-plugin)-http://stackoverflow.com/questions/8425453/maven-build-with-dependencies –

+0

是的,我做用maven構建我的jar與依賴關係。我會嘗試添加Hbase。 – monti