最後3天,我試圖建立一個泊塢窗機3個部分組成: 火花法師,一個星火工人和1名司機(Java)的應用如何在Docker容器中設置執行器IP?
當從泊塢窗以外啓動驅動程序,一切工作正常。但是開始,所有三個組成部分導致了端口防火牆主機的噩夢
爲了保持它(第一),簡單的我用泊塢窗 - 撰寫 - 這是我的搬運工,compose.yml:
driver:
hostname: driver
image: driverimage
command: -Dexec.args="0 192.168.99.100" -Dspark.driver.port=7001 -Dspark.driver.host=driver -Dspark.executor.port=7006 -Dspark.broadcast.port=15001 -Dspark.fileserver.port=15002 -Dspark.blockManager.port=15003 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory
ports:
- 10200:10200 # Module REST Port
- 4040:4040 # Web UI (Spark)
- 7001:7001 # Driver Port (Spark)
- 15001:15001 # Broadcast (Spark)
- 15002:15002 # File Server (Spark)
- 15003:15003 # Blockmanager (Spark)
- 7337:7337 # Shuffle? (Spark)
extra_hosts:
- sparkmaster:192.168.99.100
- sparkworker:192.168.99.100
environment:
SPARK_LOCAL_IP: 192.168.99.100
#SPARK_MASTER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
#SPARK_WORKER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_JAVA_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=15001 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
sparkmaster:
extra_hosts:
- driver:192.168.99.100
image: gettyimages/spark
command: /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h sparkmaster
hostname: sparkmaster
environment:
SPARK_CONF_DIR: /conf
MASTER: spark://sparkmaster:7077
SPARK_LOCAL_IP: 192.168.99.100
SPARK_JAVA_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_WORKER_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_MASTER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
#SPARK_WORKER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
#SPARK_JAVA_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
expose:
- 7001
- 7002
- 7003
- 7004
- 7005
- 7006
- 7077
- 6066
ports:
- 6066:6066
- 7077:7077 # Master (Main Port)
- 8080:8080 # Web UI
#- 7006:7006 # Executor
sparkworker:
extra_hosts:
- driver:192.168.99.100
image: gettyimages/spark
command: /usr/spark/bin/spark-class org.apache.spark.deploy.worker.Worker -h sparkworker spark://sparkmaster:7077
# volumes:
# - ./spark/logs:/log/spark
hostname: sparkworker
environment:
SPARK_CONF_DIR: /conf
SPARK_WORKER_CORES: 4
SPARK_WORKER_MEMORY: 4g
SPARK_WORKER_PORT: 8881
SPARK_WORKER_WEBUI_PORT: 8081
SPARK_LOCAL_IP: 192.168.99.100
#SPARK_MASTER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_JAVA_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_MASTER_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_WORKER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=15003 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
#SPARK_JAVA_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
links:
- sparkmaster
expose:
- 7001
- 7002
- 7003
- 7004
- 7005
- 7006
- 7012
- 7013
- 7014
- 7015
- 7016
- 8881
ports:
- 8081:8081 # WebUI
#- 15003:15003 # Blockmanager+
- 7005:7005 # Executor
- 7006:7006 # Executor
#- 7006:7006 # Executor
我甚至不知道哪個端口實際使用等等。我知道的是,我目前的問題是以下。司機可以與師父溝通,師父可以與員工溝通,我認爲司機可以與員工溝通但是!司機不能與執行者通信。我也發現了這個問題。當我打開應用程序UI並打開exectuors選項卡時,它顯示「Executor 0 - Address 172.17.0.1:7005」。
所以問題是,驅動程序使用Docker網關地址執行程序尋址執行程序,該程序不起作用。我嘗試了幾件事情(SPARK_LOCAL_IP,使用顯式主機名等),但驅動程序總是試圖與Docker Gateway進行通信......任何想法如何實現驅動程序可以與執行者/工作者進行通信?
是否所有這些組件需要在主機上暴露的端口?如果沒有,爲什麼不使用只需要本地通信的容器之間的鏈接?這會在暴露的端口上減少很多。聽起來主人需要監聽所有接口(0.0.0.0)。但是我能夠挖掘出https://issues.apache.org/jira/browse/SPARK-4389,這聽起來像是不可能的。 –
目前這只是一個測試,即使有鏈接似乎不起作用。但是我不想使用鏈接,因爲後來我想在不同的vms上分發這些組件。通過這個測試,我只想找出哪些端口需要公開等。但我不明白,我認爲火花文檔是一團糟。有很多選項可以設置爲覆蓋沒有很好記錄的端口:/ – M156
直到Spark可以支持使用0.0.0.0綁定到所有接口,我認爲這不會按照您的打算。 –