2016-01-06 102 views
0

我已經設置了一個三節點spark集羣,該集羣也用作hadoop集羣。
主/ worker1也namenode的/ datanode1
worker2也datanode2
worker3也datanode3
無法將pyspark連接到主

節點都具有私有IP地址的虛擬機,但我也創建了一個靜態IP地址爲他們。

私有IP:192.168.0.4 - 靜態ip:xxx117
私有IP:192.168.0.7 - 靜態ip:xxx118
私有IP:192.168.0.2 - 靜態ip:xxx120

的Hadoop版本是Hadoop的2.6.3
星火版本是火花1.5.2彬hadoop2.6
Java版本爲1.7.0_79

當我使用的命令行:
$ MASTER=spark://x.x.x.117:7077 pyspark --master yarn-client

它不會給出任何錯誤,並且在屏幕上顯示詳細消息之後 - 我最終會得到pyspark提示符,並且可以運行spark工作。它只是在本地運行。
另外,當我檢查Spark WebUI:http://x.x.x.117:8080時,pyspark應用程序不會顯示在頁面的「Running Applications」部分下。我懷疑pyspark shell並沒有真正在集羣模式下運行。

所以,我想這下面的命令:
$ MASTER=spark://x.x.x.117:7077 pyspark

上面的命令給出了控制檯上的消息:

 
Python 2.7.5 (default, Jun 24 2015, 00:41:19) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
16/01/06 20:14:39 INFO spark.SparkContext: Running Spark version 1.5.2 
16/01/06 20:14:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/01/06 20:14:40 INFO spark.SecurityManager: Changing view acls to: centos 
16/01/06 20:14:40 INFO spark.SecurityManager: Changing modify acls to: centos 
16/01/06 20:14:40 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(centos); users with modify permissions: Set(centos) 
16/01/06 20:14:41 INFO slf4j.Slf4jLogger: Slf4jLogger started 
16/01/06 20:14:41 INFO Remoting: Starting remoting 
16/01/06 20:14:41 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:51079] 
16/01/06 20:14:41 INFO util.Utils: Successfully started service 'sparkDriver' on port 51079. 
16/01/06 20:14:41 INFO spark.SparkEnv: Registering MapOutputTracker 
16/01/06 20:14:41 INFO spark.SparkEnv: Registering BlockManagerMaster 
16/01/06 20:14:41 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-0d984b69-ad9c-4ced-ae65-ffd3bc1c79f5 
16/01/06 20:14:41 INFO storage.MemoryStore: MemoryStore started with capacity 2.6 GB 
16/01/06 20:14:41 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-f3b76c61-812b-4413-b86e-f42c1399d548/httpd-3a2c827d-4d6c-4d2e-8625-db27851c143d 
16/01/06 20:14:41 INFO spark.HttpServer: Starting HTTP Server 
16/01/06 20:14:41 INFO server.Server: jetty-8.y.z-SNAPSHOT 
16/01/06 20:14:41 INFO server.AbstractConnector: Started [email protected]:42438 
16/01/06 20:14:41 INFO util.Utils: Successfully started service 'HTTP file server' on port 42438. 
16/01/06 20:14:41 INFO spark.SparkEnv: Registering OutputCommitCoordinator 
16/01/06 20:14:41 INFO server.Server: jetty-8.y.z-SNAPSHOT 
16/01/06 20:14:41 INFO server.AbstractConnector: Started [email protected]:4040 
16/01/06 20:14:41 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 
16/01/06 20:14:41 INFO ui.SparkUI: Started SparkUI at http://192.168.0.4:4040 
16/01/06 20:14:41 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 
16/01/06 20:14:42 INFO client.AppClient$ClientEndpoint: Connecting to master spark://x.x.x.117:7077... 
16/01/06 20:15:02 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main] 
java.util.concurrent.RejectedExecutionException: Task [email protected] rejected from [email protected][Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0] 
     at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) 
     at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) 
     at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) 
     at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110) 
     at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:96) 
     at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:95) 
     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) 
     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) 
     at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) 
     at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
     at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
     at org.apache.spark.deploy.client.AppClient$ClientEndpoint.tryRegisterAllMasters(AppClient.scala:95) 
     at org.apache.spark.deploy.client.AppClient$ClientEndpoint.org$apache$spark$deploy$client$AppClient$ClientEndpoint$$registerWithMaster(AppClient.scala:121) 
     at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:132) 
     at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119) 
     at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:124) 
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) 
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) 
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:745) 
16/01/06 20:15:02 INFO storage.DiskBlockManager: Shutdown hook called 
16/01/06 20:15:02 INFO util.ShutdownHookManager: Shutdown hook called 
16/01/06 20:15:02 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-f3b76c61-812b-4413-b86e-f42c1399d548 
Traceback (most recent call last): 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/pyspark/shell.py", line 43, in 
    sc = SparkContext(pyFiles=add_files) 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/pyspark/context.py", line 113, in __init__ 
    conf, jsc, profiler_cls) 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/pyspark/context.py", line 170, in _do_init 
    self._jsc = jsc or self._initialize_context(self._conf._jconf) 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/pyspark/context.py", line 224, in _initialize_context 
    return self._jvm.JavaSparkContext(jconf) 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 699, in __call__ 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 369, in send_command 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 362, in send_command 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 318, in _get_connection 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 325, in _create_connection 
    File "/opt/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 432, in start 
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server 
>>> 

看的主生成的日誌文件:

 
16/01/06 19:32:30 INFO master.Master: Registered signal handlers for [TERM, HUP, INT] 
16/01/06 19:32:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classe 
s where applicable 
16/01/06 19:32:31 INFO spark.SecurityManager: Changing view acls to: root 
16/01/06 19:32:31 INFO spark.SecurityManager: Changing modify acls to: root 
16/01/06 19:32:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permiss 
ions: Set(root); users with modify permissions: Set(root) 
16/01/06 19:32:31 INFO slf4j.Slf4jLogger: Slf4jLogger started 
16/01/06 19:32:31 INFO Remoting: Starting remoting 
16/01/06 19:32:32 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077. 
16/01/06 19:32:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:7077] 
16/01/06 19:32:32 INFO master.Master: Starting Spark master at spark://cassandra-spark-1:7077 
16/01/06 19:32:32 INFO master.Master: Running Spark version 1.5.2 
16/01/06 19:32:32 INFO server.Server: jetty-8.y.z-SNAPSHOT 
16/01/06 19:32:33 INFO server.AbstractConnector: Started [email protected]:8080 
16/01/06 19:32:33 INFO util.Utils: Successfully started service 'MasterUI' on port 8080. 
16/01/06 19:32:33 INFO ui.MasterWebUI: Started MasterWebUI at http://192.168.0.4:8080 
16/01/06 19:32:33 INFO server.Server: jetty-8.y.z-SNAPSHOT 
16/01/06 19:32:33 INFO server.AbstractConnector: Started [email protected]:6066 
16/01/06 19:32:33 INFO util.Utils: Successfully started service on port 6066. 
16/01/06 19:32:33 INFO rest.StandaloneRestServer: Started REST server for submitting applications on port 6066 
16/01/06 19:32:33 INFO master.Master: I have been elected leader! New state: ALIVE 
16/01/06 19:32:35 INFO master.Master: Registering worker 192.168.0.7:52930 with 2 cores, 6.6 GB RAM 
16/01/06 19:32:35 INFO master.Master: Registering worker 192.168.0.2:48119 with 2 cores, 6.6 GB RAM 
16/01/06 19:32:35 INFO master.Master: Registering worker 192.168.0.4:56830 with 2 cores, 6.6 GB RAM 
16/01/06 19:33:32 ERROR akka.ErrorMonitor: dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Act 
or[akka.tcp://[email protected]:7077/]] arriving at [akka.tcp://[email protected]:7077] inbound addresses are [ak 
ka.tcp://[email protected]:7077] 
akka.event.Logging$Error$NoCause$ 
16/01/06 19:33:52 INFO master.Master: 192.168.0.4:35598 got disassociated, removing it. 
16/01/06 19:33:52 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:3559 
8] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/01/06 19:33:52 INFO master.Master: 192.168.0.4:35598 got disassociated, removing it. 
16/01/06 19:38:36 ERROR akka.ErrorMonitor: dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Act 
or[akka.tcp://[email protected]:7077/]] arriving at [akka.tcp://[email protected]:7077] inbound addresses are [akka.t 
cp://[email protected]:7077] 
akka.event.Logging$Error$NoCause$ 
16/01/06 19:38:56 INFO master.Master: 192.168.0.4:36078 got disassociated, removing it. 
16/01/06 19:38:56 INFO master.Master: 192.168.0.4:36078 got disassociated, removing it. 
16/01/06 19:38:56 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:3607 
8] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/01/06 20:14:42 ERROR akka.ErrorMonitor: dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Act 
or[akka.tcp://[email protected]:7077/]] arriving at [akka.tcp://[email protected]:7077] inbound addresses are [ak 
ka.tcp://[email protected]:7077] 
akka.event.Logging$Error$NoCause$ 
16/01/06 20:15:02 INFO master.Master: 192.168.0.4:51079 got disassociated, removing it. 
16/01/06 20:15:02 INFO master.Master: 192.168.0.4:51079 got disassociated, removing it. 
16/01/06 20:15:02 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:5107 
9] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 


我將不勝感激任何幫助。謝謝!

+0

您是否嘗試過使用spark-shell進行連接以查看是否可以連接? – femibyte

回答

0

有一個類似的問題,當您訪問http://x.x.x.117:8080時,通過將我的主更改爲與url:字段相同的方式修復它。

1

確保spark安裝上的conf/spark-defaults.conf具有主集。像spark.master spark://x.x.x.117:7077

0

以下命令

MASTER =火花的東西:// xxx117:7077 pyspark --master紗客戶

是怎樣的一個種族conditon的:其中 「大師」打算贏得

  1. MASTER = spark:// xxx117:7077
  2. --master紗客戶

你爲什麼要指定這兩個?

如果您想要standalone主控,您可以使用第一個。第二種方法是依靠yarn作爲資源管理器。