2016-12-08 190 views
1

我試圖在使用Spark 2.0.1的8臺RHEL 7.3 x86機器上設置8節點羣集。 start-master.sh經過精細:Apache Spark:worker無法連接到主服務器,但可以ping通並從工作服務器到主服務器

Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host lambda.foo.net --port 7077 --webui-port 8080 
======================================== 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
16/12/08 04:26:46 INFO Master: Started daemon with process name: [email protected] 
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for TERM 
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for HUP 
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for INT 
16/12/08 04:26:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/12/08 04:26:46 INFO SecurityManager: Changing view acls to: root 
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls to: root 
16/12/08 04:26:46 INFO SecurityManager: Changing view acls groups to: 
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls groups to: 
16/12/08 04:26:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 
16/12/08 04:26:46 INFO Utils: Successfully started service 'sparkMaster' on port 7077. 
16/12/08 04:26:46 INFO Master: Starting Spark master at spark://lambda.foo.net:7077 
16/12/08 04:26:46 INFO Master: Running Spark version 2.0.1 
16/12/08 04:26:46 INFO Utils: Successfully started service 'MasterUI' on port 8080. 
16/12/08 04:26:46 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://19.341.11.212:8080 
16/12/08 04:26:46 INFO Utils: Successfully started service on port 6066. 
16/12/08 04:26:46 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066 
16/12/08 04:26:46 INFO Master: I have been elected leader! New state: ALIVE 

但當我試圖彈出工人,使用start-slaves.sh,我的日誌工人看到的是:

Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://lambda.foo.net:7077 
======================================== 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
16/12/08 04:30:00 INFO Worker: Started daemon with process name: [email protected] 
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for TERM 
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for HUP 
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for INT 
16/12/08 04:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/12/08 04:30:00 INFO SecurityManager: Changing view acls to: root 
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls to: root 
16/12/08 04:30:00 INFO SecurityManager: Changing view acls groups to: 
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls groups to: 
16/12/08 04:30:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 
16/12/08 04:30:00 INFO Utils: Successfully started service 'sparkWorker' on port 35858. 
16/12/08 04:30:00 INFO Worker: Starting Spark worker 15.242.22.179:35858 with 24 cores, 1510.2 GB RAM 
16/12/08 04:30:00 INFO Worker: Running Spark version 2.0.1 
16/12/08 04:30:00 INFO Worker: Spark home: /usr/local/bin/spark-2.0.1-bin-hadoop2.7 
16/12/08 04:30:00 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 
16/12/08 04:30:00 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://15.242.22.179:8081 
16/12/08 04:30:00 INFO Worker: Connecting to master lambda.foo.net:7077... 
16/12/08 04:30:00 WARN Worker: Failed to connect to master lambda.foo.net:7077 
org.apache.spark.SparkException: Exception thrown in awaitResult 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) 
     at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
     at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) 
     at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) 
     at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88) 
     at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96) 
     at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:216) 
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.io.IOException: Failed to connect to lambda.foo.net/19.341.11.212:7077 
     at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) 
     at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) 
     at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197) 
     at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191) 
     at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187) 
     ... 4 more 
Caused by: java.net.NoRouteToHostException: No route to host: lambda.foo.net/19.341.11.212:7077 
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
     at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) 
     at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) 
     at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) 
     at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) 
     at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
     at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
     ... 1 more 
16/12/08 04:30:12 INFO Worker: Retrying connection to master (attempt # 1) 
16/12/08 04:30:12 INFO Worker: Connecting to master lambda.foo.net:7077... 
16/12/08 04:30:12 WARN Worker: Failed to connect to master lambda.foo.net:7077 
org.apache.spark.SparkException: Exception thrown in awaitResult 
     at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) 

所以它說「沒有路線到主人」。但是我可以成功地從worker節點ping主節點,以及從worker節點到主節點的ssh節點。

爲什麼spark會說「沒有路由到主機」?

回答

1

問題解決:防火牆阻止了數據包。

相關問題