2017-01-31 65 views
3

Hadoop 2.7.3,Spark 2.1.0和Hive 2.1.1。將Spark設置爲Hive的默認執行引擎

我想設置spark作爲配置單元的默認執行引擎。我將$ SPARK_HOME/jars中的所有jar文件上傳到hdfs文件夾,並將scala-library,spark-core和spark-network-common jar複製到HIVE_HOME/lib。然後我配置蜂巢-site.xml中具有以下特性:

<property> 
    <name>hive.execution.engine</name> 
    <value>spark</value> 
    </property> 
    <property> 
    <name>spark.master</name> 
    <value>spark://master:7077</value> 
    <description>Spark Master URL</description> 
    </property> 
    <property> 
    <name>spark.eventLog.enabled</name> 
    <value>true</value> 
    <description>Spark Event Log</description> 
    </property> 
    <property> 
    <name>spark.eventLog.dir</name> 
    <value>hdfs://master:8020/user/spark/eventLogging</value> 
    <description>Spark event log folder</description> 
    </property> 
    <property> 
    <name>spark.executor.memory</name> 
    <value>512m</value> 
    <description>Spark executor memory</description> 
    </property> 
    <property> 
    <name>spark.serializer</name> 
    <value>org.apache.spark.serializer.KryoSerializer</value> 
    <description>Spark serializer</description> 
    </property> 
    <property> 
    <name>spark.yarn.jars</name> 
    <value>hdfs://master:8020/user/spark/spark-jars/*</value> 
</property> 

在蜂房殼,我做了以下:

hive> add jar ${env:HIVE_HOME}/lib/scala-library-2.11.8.jar; 
Added [/usr/local/hive/hive-2.1.1/lib/scala-library-2.11.8.jar] to class path 
Added resources: [/usr/local/hive/hive-2.1.1/lib/scala-library-2.11.8.jar] 
hive> add jar ${env:HIVE_HOME}/lib/spark-core_2.11-2.1.0.jar; 
Added [/usr/local/hive/hive-2.1.1/lib/spark-core_2.11-2.1.0.jar] to class path 
Added resources: [/usr/local/hive/hive-2.1.1/lib/spark-core_2.11-2.1.0.jar] 
hive> add jar ${env:HIVE_HOME}/lib/spark-network-common_2.11-2.1.0.jar; 
Added [/usr/local/hive/hive-2.1.1/lib/spark-network-common_2.11-2.1.0.jar] to class path 
Added resources: [/usr/local/hive/hive-2.1.1/lib/spark-network-common_2.11-2.1.0.jar] 
hive> set hive.execution.engine=spark; 

當我試圖執行

蜂房> SELECT COUNT( *)from tableName;

我有以下幾點:

Query ID = hduser_20170130230014_6e23dacc-78e8-4bd6-9fad-1344f6d0569e 
Total jobs = 1 
Launching Job 1 out of 1 
In order to change the average load for a reducer (in bytes): 
    set hive.exec.reducers.bytes.per.reducer=<number> 
In order to limit the maximum number of reducers: 
    set hive.exec.reducers.max=<number> 
In order to set a constant number of reducers: 
    set mapreduce.job.reduces=<number> 
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' 
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask 

蜂巢日誌顯示java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener

ERROR [main] client.SparkClientImpl: Error while waiting for client to connect. 
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client 'cc10915b-da97-4fd7-9960-49c03ea380d7'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000 
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8 
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000 
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256 
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800 
java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener 
    at java.lang.ClassLoader.defineClass1(Native Method) 
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763) 
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) 
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    at java.lang.Class.forName0(Native Method) 
    at java.lang.Class.forName(Class.java:348) 
    at org.apache.spark.util.Utils$.classForName(Utils.scala:229) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.ClassNotFoundException: org.apache.spark.JavaSparkListener 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    ... 19 more 

    at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) 
    at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:106) 
    at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) 
    at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:99) 
    at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:95) 
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:69) 
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62) 
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) 
    at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136) 
    at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:89) 
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073) 
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) 
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) 
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) 
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) 
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) 
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) 
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) 
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) 
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) 
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
Caused by: java.lang.RuntimeException: Cancel client 'cc10915b-da97-4fd7-9960-49c03ea380d7'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000 
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8 
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000 
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256 
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800 
java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener 
    at java.lang.ClassLoader.defineClass1(Native Method) 
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763) 
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) 
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    at java.lang.Class.forName0(Native Method) 
    at java.lang.Class.forName(Class.java:348) 
    at org.apache.spark.util.Utils$.classForName(Utils.scala:229) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.ClassNotFoundException: org.apache.spark.JavaSparkListener 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    ... 19 more 

請幫我蜂巢2.1.1星火2.1.0集成。

+0

你有沒有發現任何變通辦法解決這個問題? – chuseuiti

+0

我猜你試圖在配置單元進程中運行spark驅動程序來獲取此異常。你試過了,讓蜂房使用spark-submit? –

+0

面對與Hadoop 2.8.1,Spark 2.2.0和Hive 2.1.1相同的問題。有沒有人找到解決方法? –

回答

2

這是Spark中的一個bug,org.apache.spark.JavaSparkListener類已經從Spark 2.0.0中移除。它已被修復並在審查過程中。如果修復將被批准那麼這將是在未來星火可用的(可能是星火2.2.0)

https://issues.apache.org/jira/browse/SPARK-17563

+0

因此,直到此問題得到解決,它與Hive2不兼容Spark2?或者是否有解決方法? – chuseuiti

+0

看起來像這樣的問題已經在Hive 2.2.0中解決了,它尚未發佈(https://issues.apache.org/jira/browse/HIVE-14029) –

相關問題