Spark：Memory Usage

apache-spark

2016-02-10 140 views 0 likes

我正在測量Spark中應用程序（WordCount）的內存使用情況，其中ps -p WorkerPID -o rss。但結果沒有任何意義。因爲每個數據量（1MB，10MB，100MB，1GB，10GB）都有相同的內存使用量。對於1GB和10GB的數據，測量結果甚至小於1GB。工人是測量內存使用量的錯誤過程嗎？ Spark Process Model的哪個進程負責內存分配？Spark：Memory Usage

來源

2016-02-10 lary

回答

與流行的觀點相反，Spark不必將所有數據加載到主內存中。此外WordCount是所需的存儲器的一個普通的應用和量僅略微取決於輸入：

量每分區裝載SparkContext.textFile數據的依賴於結構沒有輸入尺寸（參見例如：Why does partition parameter of SparkContext.textFile not take effect?）。
鍵值對的大小與典型輸入大致相同。
如果需要，中間數據可能會溢出到磁盤。
執行程序使用的最後但並非最不重要的內存量由配置限制。

保持所有這一切的行爲不同於你所看到的會令人困擾的充其量。

來源

2016-02-10 15:43:23 zero323

感謝您的回答。您能推薦一些關於Sparks內存管理的額外信息（鏈接或文獻）嗎？ – lary

http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf – zero323

相關問題

1. Contract.Requires usage
2. NeutralResourcesLanguage Usage
3. CoReleaseMarshalData usage
4. SetThreadAffinityMask Usage
5. MagickQuantizeImage usage
6. Structureton singleton usage
7. memory_get_peak_usage（）with「real usage」
8. Xamarin android camera2 usage
9. FirebaseUI-iOS FUIIndexArray Usage
10. C Preprecessor Usage

11. Cabal usage（Haskell）
12. JSON :: XS「Usage」croak
13. Python i2c write_bus_data usage
14. Rolling File appender usage
15. native heap usage android
16. Spring- @ ControllerAdvice usage
17. HABTM filter_or_create usage
18. AngularJS Factory Usage
19. three.js shadowCascade usage
20. RabbitMQ no ack usage
21. CakePHP displayField usage
22. NSFileManager fileExistsAtPath：isDirectory usage
23. Jquery：NOT Selector Usage
24. uploadTaskWithRequest：fromFile memory usage
25. AngularJS：ng-show usage
26. javascript registerElement usage
27. php preg_replace usage -
28. php register_shutdown_function usage
29. Spreadsheet :: WriteExcel Memory Usage
30. ContactPicker.PickSingleContactAsync Windows 8 C＃usage