2016-08-17 295 views
5

我有1個主站和6個使用hadoop 2.6.0和spark 1.6.2的預建版本的集羣。我正在運行hadoop MR和spark作業,而沒有在所有節點上安裝openjdk 7時出現任何問題。但是,當我在所有節點上將openjdk 7升級到openjdk 8時,會引發提交和spark-shell導致的錯誤。運行的紗線與火花不與Java一起工作8

16/08/17 14:06:22 ERROR client.TransportClient: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelExce  ption 
java.nio.channels.ClosedChannelException 
16/08/17 14:06:22 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts 
org.apache.spark.SparkException: Exception thrown in awaitResult 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) 
     at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) 
     at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) 
     at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply$m  cV$sp(YarnSchedulerBackend.scala:271) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y  arnSchedulerBackend.scala:271) 
     at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y  arnSchedulerBackend.scala:271) 
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) 
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.io.IOException: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelException 
     at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) 
     at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) 
     at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) 
     at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845) 
     at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873) 
     at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 
     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
     at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
     ... 1 more 
Caused by: java.nio.channels.ClosedChannelException 
16/08/17 14:06:22 ERROR spark.SparkContext: Error initializing SparkContext. 
java.lang.IllegalStateException: Spark context stopped while waiting for backend 
     at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581) 
     at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162) 
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:549) 
     at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:236) 
     at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
     at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 
Traceback (most recent call last): 
    File "/home/hd_spark/spark2/python/pyspark/shell.py", line 49, in <module> 
    spark = SparkSession.builder.getOrCreate() 
    File "/home/hd_spark/spark2/python/pyspark/sql/session.py", line 169, in getOrCreate 
    sc = SparkContext.getOrCreate(sparkConf) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 294, in getOrCreate 
    SparkContext(conf=conf or SparkConf()) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 115, in __init__ 
    conf, jsc, profiler_cls) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 168, in _do_init 
    self._jsc = jsc or self._initialize_context(self._conf._jconf) 
    File "/home/hd_spark/spark2/python/pyspark/context.py", line 233, in _initialize_context 
    return self._jvm.JavaSparkContext(jconf) 
    File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 1183, in __call__ 
    File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
: java.lang.IllegalStateException: Spark context stopped while waiting for backend 
     at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581) 
     at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162) 
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:549) 
     at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:236) 
     at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
     at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 

我已出口JAVA_HOME上的.bashrc並且已經設置了使用

sudo update-alternatives --config java 
sudo update-alternatives --config javac 

這些命令的openjdk 8爲默認的java。此外,我已經嘗試與甲骨文Java 8和相同的錯誤出現。從屬節點上的容器日誌具有相同的錯誤,如下所示。

SLF4J: Class path contains multiple SLF4J bindings. 
SLF4J: Found binding in [jar:file:/tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/filecache/17/__spark_libs__8247267244939901627.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
16/08/17 14:05:11 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: [email protected] 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for TERM 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for HUP 
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for INT 
16/08/17 14:05:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls to: hd_spark 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls to: hd_spark 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls groups to: 
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls groups to: 
16/08/17 14:05:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hd_spark); groups with view permissions: Set(); users with modify permissions: Set(hd_spark); groups with modify permissions: Set() 
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 78 ms (0 ms spent in bootstraps) 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls to: hd_spark 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls to: hd_spark 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls groups to: 
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls groups to: 
16/08/17 14:05:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hd_spark); groups with view permissions: Set(); users with modify permissions: Set(hd_spark); groups with modify permissions: Set() 
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 1 ms (0 ms spent in bootstraps) 
16/08/17 14:05:12 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/appcache/application_1471352972661_0005/blockmgr-d9f23a56-1420-4cd4-abfd-ae9e128c688c 
16/08/17 14:05:12 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 
16/08/17 14:05:12 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://[email protected]:37417 
16/08/17 14:05:13 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 
16/08/17 14:05:13 INFO storage.DiskBlockManager: Shutdown hook called 
16/08/17 14:05:13 INFO util.ShutdownHookManager: Shutdown hook called 

我試圖與火花1.6.2預建版本,2.0火花預建版本,並且還通過建立它自己試着用火花2.0。

即使在升級到java 8之後,Hadoop作業仍能正常工作。當我切換回到java 7時,spark工作正常。

我的scala版本是2.11,OS是Ubuntu 14.04.4 LTS。

如果有人能給我一個解決這個問題的想法,這將是非常好的。

謝謝!

ps我在日誌中將我的IP地址更改爲xxx.xxx.xxx.xx。

+0

貌似工作人員的下列屬性克服這種嘗試連接到驅動程序,但失敗:'16/08/17 14:05:12 INFO executor.CoarseGrainedExecutorBackend:連接到驅動程序:spark://[email protected]:37417 16/08/17 14:05:13 ERROR executor.CoarseGrainedExecutorBackend:RECEIVED SIGNAL TERM'。司機日誌說什麼? –

+0

我在哪裏可以找到驅動程序日誌?我在hadoop/logs/userlog目錄中發現了工作節點日誌,但在主節點中找不到與驅動程序相關的任何日誌。在spark/logs目錄中,只有歷史服務器日誌和主節點中的hadoop/logs/userlog爲空。謝謝! – jmoa

+0

http://spark.apache.org/docs/latest/running-on-yarn.html –

回答

8

由於2016年9月12日,這是一個攔截器的問題:https://issues.apache.org/jira/browse/YARN-4714

您可以通過設置在紗線的site.xml

<property> 
    <name>yarn.nodemanager.pmem-check-enabled</name> 
    <value>false</value> 
</property> 

<property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value> 
</property> 
+0

感謝您的回覆!我最近已經回到了Java 7,但我會嘗試它並評論它是否可行。 – jmoa

+0

@jmoa好運嗎? – simpleJack

+0

這對我來說非常合適。 –