我是新的火花。試試運行spark on yarn in yarn-client mode
。紗線的SPARK_EXECUTOR_INSTANCES未在SPARK SHELL中工作,YARN客戶端模式
火花VERSION = 1.0.2 HADOOP VERSION = 2.2.0
羣集具有3個活節點。
性質spark-env.sh
SPARK_EXECUTOR_MEMORY =設定1G
SPARK_EXECUTOR_INSTANCES = 3個
SPARK_EXECUTOR_CORES = 1個
SPARK_DRIVER_MEMORY用於= 2G
命令:/ bin/sp方舟殼--master紗客戶
但登錄到spark-shell
後,它會註冊只有1某些缺省MEM分配給它執行。
我通過spark-web UI
確認它只有1個執行程序,而且它也只在主節點(YARN resource manager node
)上。
信息yarn.Client:命令啓動星火ApplicationMaster: 目錄($ JAVA_HOME /斌/ java的,-server,-Xmx2048m, -Djava.io.tmpdir = $ PWD/tmp目錄,-Dspark。 tachyonStore.folderName = \「spark-fc6383cc-0904-4af9-8abd-3b66b3f0f461 \」, -Dspark.yarn.secondary.jars = \「\」,-Dspark.home = \「/ home/impadmin/spark-1.0 .2-bin-hadoop2 \「,-Dspark.repl.class.uri = \」
http://master_node:46823
\「,-Dspark.driver.host = \」master_node \「,-Dspark.app.name = \」Spark shell \「 ,-Dspark.jars = \「\」,-Dspark.fileserver.uri = \「http://master_node:46267
\」,-Dspark.master = \「yarn-client \」,-Dspark.driver.port = \「41209 \」, -Dspark.httpBroadcast.uri = \「http://master_node:36965
\」,-Dlog4j.configuration = log4j-spark-container.properties,org.apache.spark。 deploy.yarn.ExecutorLauncher,--class,notused,--jar ,null,--args'master_node:41209',--executor-memory,1024, --executor-cores,1,--num-執行人,3,1>,/標準輸出,2>,/標準錯誤)... ... ... 14/09/10 22:21:24 INFO cluster.YarnClientSchedulerBackend: Registered executor:
演員[akka.tcp:// sparkExecutor @ master_node:53619 /用戶/執行人#1075999905] ID爲1 14/09/10 22:21:24 INFO storage.BlockManagerInfo:註冊塊管理器master_node:40205與589.2 MB RAM 14/09/10 22:21:25 INFO cluster.YarnClientClusterScheduler:YarnClientClusterScheduler.postStartHook done 14/09/10 22:21:25信息repl.SparkILoop:創建火花上下文.. 火花上下文可用作sc。
並且在運行任何並行化的spark動作之後,它只是在這個節點上運行所有這些任務串聯!
感謝您的解決方案...我有同樣的問題,原因是我正在使用此命令spark-submit --master yarn-client file.py --num-executors 6.正確的語法:spark-submit --num-executors 6 --master yarn-client file.py – user1525721 2015-05-08 22:08:28