2014-11-02 35 views
5

我試圖使用Spark Streaming將數據從HDFS加載到Hbase表。 我將數據放入HDFS目錄運行時,並使用textFileStream函數讀取它。 由於spark在類路徑中沒有hbase jars,即使在火狐shell中導入了Hbase jars,它也會給我一個錯誤。Spark Shell無法找到Hbase類

scala> import org.apache.hadoop.hbase.mapred.TableOutputFormat 
<console>:10: error: object hbase is not a member of package org.apache.hadoop 
     import org.apache.hadoop.hbase.mapred.TableOutputFormat 

But if i add the hbase jars in the classpath while starting the spark shell then i am not getting any error. But it is still not able to find certain classes down the path. 

bin/spark-shell --jars /hbase/hbase-0.94.13/hbase-0.94.13-mapr-1401.jar 

scala> import org.apache.hadoop.hbase.{ HBaseConfiguration, HColumnDescriptor, HTableDescriptor } 
import org.apache.hadoop.hbase.{HBaseConfiguration, HColumnDescriptor, HTableDescriptor} 

scala> import org.apache.hadoop.hbase.client.{ HBaseAdmin, Put } 
import org.apache.hadoop.hbase.client.{HBaseAdmin, Put} 

scala> import org.apache.hadoop.hbase.io.ImmutableBytesWritable 
import org.apache.hadoop.hbase.io.ImmutableBytesWritable 

scala> import org.apache.hadoop.hbase.mapred.TableOutputFormat 
import org.apache.hadoop.hbase.mapred.TableOutputFormat 

scala> import org.apache.hadoop.hbase.mapreduce.TableInputFormat 
import org.apache.hadoop.hbase.mapreduce.TableInputFormat 

scala> import org.apache.hadoop.hbase.util.Bytes 
import org.apache.hadoop.hbase.util.Bytes 

scala> import org.apache.hadoop.mapred.JobConf 
import org.apache.hadoop.mapred.JobConf 

scala> import org.apache.spark.SparkContext 
import org.apache.spark.SparkContext 

scala> import org.apache.spark.rdd.{ PairRDDFunctions, RDD } 
import org.apache.spark.rdd.{PairRDDFunctions, RDD} 

scala> import org.apache.spark.streaming._ 
import org.apache.spark.streaming._ 

scala> import org.apache.spark.streaming.StreamingContext._ 
import org.apache.spark.streaming.StreamingContext._ 

scala> import org.apache.hadoop.hbase.client.mapr.{BaseTableMappingRules} 
import org.apache.hadoop.hbase.client.mapr.BaseTableMappingRules 

scala> val conf = HBaseConfiguration.create() 
conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, hbase-default.xml, hbase-site.xml 

scala> val hbaseTableName = "/app/dev/MarketingIt/hbasetables/spark_test" 
hbaseTableName: String = /app/dev/MarketingIt/hbasetables/spark_test 

scala> val admin = new HBaseAdmin(conf) 
java.lang.RuntimeException: java.io.IOException: java.lang.RuntimeException: Error occurred while instantiating com.mapr.fs.MapRTableMappingRules. 
==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules. 
     at org.apache.hadoop.hbase.client.HBaseAdmin.commonInit(HBaseAdmin.java:356) 
     at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:156) 
     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) 
     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36) 
     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38) 
     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) 
     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) 
     at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44) 
     at $iwC$$iwC$$iwC$$iwC.<init>(<console>:46) 
     at $iwC$$iwC$$iwC.<init>(<console>:48) 
     at $iwC$$iwC.<init>(<console>:50) 
     at $iwC.<init>(<console>:52) 
     at <init>(<console>:54) 
     at .<init>(<console>:58) 
     at .<clinit>(<console>) 
     at .<init>(<console>:7) 
     at .<clinit>(<console>) 
     at $print(<console>) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) 
     at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) 
     at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) 
     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) 
     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) 
     at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) 
     at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) 
     at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) 
     at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) 
     at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) 
     at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) 
     at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) 
     at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) 
     at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) 
     at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) 
     at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) 
     at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) 
     at org.apache.spark.repl.Main$.main(Main.scala:31) 
     at org.apache.spark.repl.Main.main(Main.scala) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.io.IOException: java.lang.RuntimeException: Error occurred while instantiating com.mapr.fs.MapRTableMappingRules. 
==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules. 
     at org.apache.hadoop.hbase.client.mapr.TableMappingRulesFactory.create(TableMappingRulesFactory.java:65) 
     at org.apache.hadoop.hbase.client.HBaseAdmin.commonInit(HBaseAdmin.java:348) 
     ... 47 more 
Caused by: java.lang.RuntimeException: Error occurred while instantiating com.mapr.fs.MapRTableMappingRules. 
==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules. 
     at org.apache.hadoop.hbase.client.mapr.GenericHFactory.getImplementorInstance(GenericHFactory.java:40) 
     at org.apache.hadoop.hbase.client.mapr.TableMappingRulesFactory.create(TableMappingRulesFactory.java:47) 
     ... 48 more 
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules 
     at java.lang.ClassLoader.defineClass1(Native Method) 
     at java.lang.ClassLoader.defineClass(ClassLoader.java:800) 
     at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 
     at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 
     at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 
     at java.net.URLClassLoader$1.run(URLClassLoader.java:361) 
     at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 
     at java.lang.ClassLoader.loadClass(ClassLoader.java:425) 
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 
     at java.lang.ClassLoader.loadClass(ClassLoader.java:412) 
     at java.lang.ClassLoader.loadClass(ClassLoader.java:358) 
     at java.lang.Class.forName0(Native Method) 
     at java.lang.Class.forName(Class.java:190) 
     at org.apache.hadoop.hbase.client.mapr.GenericHFactory.getImplementorInstance(GenericHFactory.java:30) 
     ... 49 more 
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.mapr.BaseTableMappingRules 
     at java.net.URLClassLoader$1.run(URLClassLoader.java:366) 
     at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 
     at java.lang.ClassLoader.loadClass(ClassLoader.java:425) 
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 
     at java.lang.ClassLoader.loadClass(ClassLoader.java:358) 
     ... 65 more 

Here as you can see i have added all the hbase jars and spark is able to find some of the hbase classes and cant find some 
All the classes are in the same jar i added. Since it is saying Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.mapr.BaseTableMappingRules. I have imported that class specifically. But i still get the same error. 

回答

2

如果您正在使用的Spark 1 +,嘗試在星火配置

設置額外的類路徑屬性添加此行火花defaults.conf -

spark.executor。 extraClassPath /opt/cloudera/parcels/CDH/lib/hive/lib/hive-hbase-handler.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar:/opt/cloudera/parcels /CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client的.jar中:/ opt/cloude RA /包裹/ CDH/lib中/ HBase的/ HBase的-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar

如果您使用的是其他分佈,查找適用於jar文件的路徑。

除了配置更改,添加驅動程序類路徑引發殼或提交您的火花的工作作爲 -

--driver類路徑
的/ opt/Cloudera公司/包裹/ CDH /lib/hbase/hbase-server.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar :/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar中:/ opt/Cloudera公司/包裹/ CDH/lib目錄/ hbase/lib/htrace-core.jar

可以將spark文件添加到spark-env.sh中的spark類路徑中,以避免每次要啓動spark-shell或提交spark任務時指定完整路徑,但在使用此方法時遇到其他問題。我發現上述選項對我更​​好。

出口 SPARK_CLASSPATH = /選擇/ Cloudera的/包裹/ CDH/LIB/HBase的/ HBase的-的server.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/選擇/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase /hbase-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar

沒有什麼更多的是需要星火1 +

如果您正在使用Spark 0.9,請看這個鏈接 - 是的,鏈接可以打破,但我沒有測試過在星火0.9和這個博客有有用的信息 http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html

2

添加行的conf/spark-env.sh

完整路徑

出口SPARK_CLASSPATH = $ {HBASE_HOME替換$ {} HBASE_HOME }/lib/*