2
我正在亞馬遜的AWS EMR中運行定製jar hadoop作業,並且我想收集運行所有Map任務所花費的時間與運行Reduce任務所耗費的時間的數據。挖掘這些我沒有發現的數據的框架中有沒有辦法?如果沒有人有任何建議,最好的方式來產生這些數據?確定EMR作業花費在Map vs Reduce任務上的時間的最佳方法是什麼?
謝謝
我正在亞馬遜的AWS EMR中運行定製jar hadoop作業,並且我想收集運行所有Map任務所花費的時間與運行Reduce任務所耗費的時間的數據。挖掘這些我沒有發現的數據的框架中有沒有辦法?如果沒有人有任何建議,最好的方式來產生這些數據?確定EMR作業花費在Map vs Reduce任務上的時間的最佳方法是什麼?
謝謝
您可以在客戶端登錄的招聘櫃檯一節信息。例如:
Job Counters
Killed reduce tasks=1
Launched map tasks=1
Launched reduce tasks=7
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1071855
Total time spent by all reduces in occupied slots (ms)=
**Total time spent by all map tasks (ms)=23819**
**Total time spent by all reduce tasks (ms)=45369**
Total vcore-milliseconds taken by all map tasks=23819
Total vcore-milliseconds taken by all reduce tasks=45369
Total megabyte-milliseconds taken by all map tasks=34299360
Total megabyte-milliseconds taken by all reduce tasks=130662720
Map-Reduce Framework
Map input records=3929235
Map output records=15716940
Map output bytes=132989251
Map output materialized bytes=633590
Input split bytes=86