爲什麼Spark在當前內存級別不足時不會分配新的YARN容器？

我擁有Cloudera集羣，具有600 V核心和3600 GiB內存的YARN容量。但管理員團隊已將容器的最大內存配置爲6 GB。我的用戶有權分配儘可能多的容器。爲什麼Spark在當前內存級別不足時不會分配新的YARN容器？

當我嘗試運行一個大小爲50 GB的作業失敗時，執行程序內存開銷錯誤的數據集上的Spark作業。

當一個容器內存不足，爲什麼不能啓動嘗試新的容器？

2017-06-05 kalyan chakravarthy

當一個容器內存不足，爲什麼不能點火嘗試新的容器？

...因爲星火默認情況下不這樣做（和你沒有另行配置的話）。

在時間內，執行程序的數量，更重要的是CPU內核和RAM內存的總數由您控制。那就是--driver-memory,--executor-memory,--driver-cores,--total-executor-cores,--executor-cores,--num-executors等等。

$ ./bin/spark-submit --help 
... 
    --driver-memory MEM   Memory for driver (e.g. 1000M, 2G) (Default: 1024M). 
    --driver-java-options  Extra Java options to pass to the driver. 
    --driver-library-path  Extra library path entries to pass to the driver. 
    --driver-class-path   Extra class path entries to pass to the driver. Note that 
           jars added with --jars are automatically included in the 
           classpath. 

    --executor-memory MEM  Memory per executor (e.g. 1000M, 2G) (Default: 1G). 
... 
Spark standalone with cluster deploy mode only: 
    --driver-cores NUM   Cores for driver (Default: 1). 
... 
Spark standalone and Mesos only: 
    --total-executor-cores NUM Total cores for all executors. 

Spark standalone and YARN only: 
    --executor-cores NUM  Number of cores per executor. (Default: 1 in YARN mode, 
           or all available cores on the worker in standalone mode) 

YARN-only: 
    --driver-cores NUM   Number of cores used by the driver, only in cluster mode 
           (Default: 1). 
    --queue QUEUE_NAME   The YARN queue to submit to (Default: "default"). 
    --num-executors NUM   Number of executors to launch (Default: 2). 
           If dynamic allocation is enabled, the initial number of 
           executors will be at least NUM. 
...

有些是部署模式而定的，而其他人在使用依賴於集羣管理器（這將是你的情況YARN）。

總結...它是你決定使用的選項分配給Spark應用程序的資源數量。

在Spark的官方文檔中閱讀Submitting Applications。

來源

2017-06-06 11:08:04

爲什麼Spark在當前內存級別不足時不會分配新的YARN容器？

回答

相關問題