2014-09-10 32 views
1

我是新的火花。試試運行spark on yarn in yarn-client mode。紗線的SPARK_EXECUTOR_INSTANCES未在SPARK SHELL中工作,YARN客戶端模式

火花VERSION = 1.0.2 HADOOP VERSION = 2.2.0

羣集具有3個活節點。

性質spark-env.sh

SPARK_EXECUTOR_MEMORY =設定1G

SPARK_EXECUTOR_INSTANCES = 3個

SPARK_EXECUTOR_CORES = 1個

SPARK_DRIVER_MEMORY用於= 2G

命令:/ bin/sp方舟殼--master紗客戶

但登錄到spark-shell後,它會註冊只有1某些缺省MEM分配給它執行。

我通過spark-web UI確認它只有1個執行程序,而且它也只在主節點(YARN resource manager node)上。

信息yarn.Client:命令啓動星火ApplicationMaster: 目錄($ JAVA_HOME /斌/ java的,-server,-Xmx2048m, -Djava.io.tmpdir = $ PWD/tmp目錄,-Dspark。 tachyonStore.folderName = \「spark-fc6383cc-0904-4af9-8abd-3b66b3f0f461 \」, -Dspark.yarn.secondary.jars = \「\」,-Dspark.home = \「/ home/impadmin/spark-1.0 .2-bin-hadoop2 \「,-Dspark.repl.class.uri = \」http://master_node:46823 \「,-Dspark.driver.host = \」master_node \「,-Dspark.app.name = \」Spark shell \「 ,-Dspark.jars = \「\」,-Dspark.fileserver.uri = \「http://master_node:46267 \」,-Dspark.master = \「yarn-client \」,-Dspark.driver.port = \「41209 \」, -Dspark.httpBroadcast.uri = \「http://master_node:36965 \」,-Dlog4j.configuration = log4j-spark-container.properties,org.apache.spark。 deploy.yarn.ExecutorLauncher,--class,notused,--jar ,null,--args'master_node:41209',--executor-memory,1024, --executor-cores,1,--num-執行人,3,1>,/標準輸出,2>,/標準錯誤)

... 

... 

... 

14/09/10 22:21:24 INFO cluster.YarnClientSchedulerBackend: Registered executor: 

演員[akka.tcp:// sparkExecutor @ master_node:53619 /用戶/執行人#1075999905] ID爲1 14/09/10 22:21:24 INFO storage.BlockManagerInfo:註冊塊管理器master_node:40205與589.2 MB RAM 14/09/10 22:21:25 INFO cluster.YarnClientClusterScheduler:YarnClientClusterScheduler.postStartHook done 14/09/10 22:21:25信息repl.SparkILoop:創建火花上下文.. 火花上下文可用作sc。

並且在運行任何並行化的spark動作之後,它只是在這個節點上運行所有這些任務串聯!

回答

4

好的我解決了這個問題。我有

火花殼--num執行人4 --master紗的客戶端我的羣集上的4個數據節點

+0

感謝您的解決方案...我有同樣的問題,原因是我正在使用此命令spark-submit --master yarn-client file.py --num-executors 6.正確的語法:spark-submit --num-executors 6 --master yarn-client file.py – user1525721 2015-05-08 22:08:28

相關問題