2013-07-04 66 views
1

我在cloudera cdh4上運行了一個canopy集羣作業(使用mahout)。要聚集的內容有大約1m的記錄(每個記錄大小小於1k)。整個hadoop環境(包括所有節點)在4G內存中運行。 cdh4的安裝是默認的。運行作業時出現以下異常。cloudera hadoop mapreduce job GC開銷限制超出錯誤

看起來根據例外情況,作業客戶端應該需要更高的jvm堆大小。但是,在cloudera manager中有很多jvm堆大小的配置選項。我將「客戶端Java堆大小的字節數」從256MiB改爲512MiB。但是,它沒有改善。

設置這些堆大小選項的任何提示/提示?

13/07/03 17:12:45 INFO input.FileInputFormat: Total input paths to process : 1 
13/07/03 17:12:46 INFO mapred.JobClient: Running job: job_201307031710_0001 
13/07/03 17:12:47 INFO mapred.JobClient: map 0% reduce 0% 
13/07/03 17:13:06 INFO mapred.JobClient: map 1% reduce 0% 
13/07/03 17:13:27 INFO mapred.JobClient: map 2% reduce 0% 
13/07/03 17:14:01 INFO mapred.JobClient: map 3% reduce 0% 
13/07/03 17:14:50 INFO mapred.JobClient: map 4% reduce 0% 
13/07/03 17:15:50 INFO mapred.JobClient: map 5% reduce 0% 
13/07/03 17:17:06 INFO mapred.JobClient: map 6% reduce 0% 
13/07/03 17:18:44 INFO mapred.JobClient: map 7% reduce 0% 
13/07/03 17:20:24 INFO mapred.JobClient: map 8% reduce 0% 
13/07/03 17:22:20 INFO mapred.JobClient: map 9% reduce 0% 
13/07/03 17:25:00 INFO mapred.JobClient: map 10% reduce 0% 
13/07/03 17:28:08 INFO mapred.JobClient: map 11% reduce 0% 
13/07/03 17:31:46 INFO mapred.JobClient: map 12% reduce 0% 
13/07/03 17:35:57 INFO mapred.JobClient: map 13% reduce 0% 
13/07/03 17:40:52 INFO mapred.JobClient: map 14% reduce 0% 
13/07/03 17:46:55 INFO mapred.JobClient: map 15% reduce 0% 
13/07/03 17:55:02 INFO mapred.JobClient: map 16% reduce 0% 
13/07/03 18:08:42 INFO mapred.JobClient: map 17% reduce 0% 
13/07/03 18:59:11 INFO mapred.JobClient: map 8% reduce 0% 
13/07/03 18:59:13 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000001_0, Status : FAILED 
Error: GC overhead limit exceeded 
13/07/03 18:59:23 INFO mapred.JobClient: map 9% reduce 0% 
13/07/03 19:00:09 INFO mapred.JobClient: map 10% reduce 0% 
13/07/03 19:01:49 INFO mapred.JobClient: map 11% reduce 0% 
13/07/03 19:04:25 INFO mapred.JobClient: map 12% reduce 0% 
13/07/03 19:07:48 INFO mapred.JobClient: map 13% reduce 0% 
13/07/03 19:12:48 INFO mapred.JobClient: map 14% reduce 0% 
13/07/03 19:19:46 INFO mapred.JobClient: map 15% reduce 0% 
13/07/03 19:29:05 INFO mapred.JobClient: map 16% reduce 0% 
13/07/03 19:43:43 INFO mapred.JobClient: map 17% reduce 0% 
13/07/03 20:49:36 INFO mapred.JobClient: map 8% reduce 0% 
13/07/03 20:49:38 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000001_1, Status : FAILED 
Error: GC overhead limit exceeded 
13/07/03 20:49:48 INFO mapred.JobClient: map 9% reduce 0% 
13/07/03 20:50:31 INFO mapred.JobClient: map 10% reduce 0% 
13/07/03 20:52:08 INFO mapred.JobClient: map 11% reduce 0% 
13/07/03 20:54:38 INFO mapred.JobClient: map 12% reduce 0% 
13/07/03 20:58:01 INFO mapred.JobClient: map 13% reduce 0% 
13/07/03 21:03:01 INFO mapred.JobClient: map 14% reduce 0% 
13/07/03 21:10:10 INFO mapred.JobClient: map 15% reduce 0% 
13/07/03 21:19:54 INFO mapred.JobClient: map 16% reduce 0% 
13/07/03 21:31:35 INFO mapred.JobClient: map 8% reduce 0% 
13/07/03 21:31:37 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000000_0, Status : FAILED 
java.lang.Throwable: Child Error 
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250) 
Caused by: java.io.IOException: Task process exit with nonzero status of 65. 
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237) 

13/07/03 21:32:09 INFO mapred.JobClient: map 9% reduce 0% 
13/07/03 21:33:31 INFO mapred.JobClient: map 10% reduce 0% 
13/07/03 21:35:42 INFO mapred.JobClient: map 11% reduce 0% 
13/07/03 21:38:41 INFO mapred.JobClient: map 12% reduce 0% 
13/07/03 21:42:27 INFO mapred.JobClient: map 13% reduce 0% 
13/07/03 21:48:20 INFO mapred.JobClient: map 14% reduce 0% 
13/07/03 21:56:12 INFO mapred.JobClient: map 15% reduce 0% 
13/07/03 22:07:20 INFO mapred.JobClient: map 16% reduce 0% 
13/07/03 22:26:36 INFO mapred.JobClient: map 17% reduce 0% 
13/07/03 23:35:30 INFO mapred.JobClient: map 8% reduce 0% 
13/07/03 23:35:32 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000000_1, Status : FAILED 
Error: GC overhead limit exceeded 
13/07/03 23:35:42 INFO mapred.JobClient: map 9% reduce 0% 
13/07/03 23:36:16 INFO mapred.JobClient: map 10% reduce 0% 
13/07/03 23:38:01 INFO mapred.JobClient: map 11% reduce 0% 
13/07/03 23:40:47 INFO mapred.JobClient: map 12% reduce 0% 
13/07/03 23:44:44 INFO mapred.JobClient: map 13% reduce 0% 
13/07/03 23:50:42 INFO mapred.JobClient: map 14% reduce 0% 
13/07/03 23:58:58 INFO mapred.JobClient: map 15% reduce 0% 
13/07/04 00:10:22 INFO mapred.JobClient: map 16% reduce 0% 
13/07/04 00:21:38 INFO mapred.JobClient: map 7% reduce 0% 
13/07/04 00:21:40 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000001_2, Status : FAILED 
java.lang.Throwable: Child Error 
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250) 
Caused by: java.io.IOException: Task process exit with nonzero status of 65. 
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237) 

13/07/04 00:21:50 INFO mapred.JobClient: map 8% reduce 0% 
13/07/04 00:22:27 INFO mapred.JobClient: map 9% reduce 0% 
13/07/04 00:23:52 INFO mapred.JobClient: map 10% reduce 0% 
13/07/04 00:26:00 INFO mapred.JobClient: map 11% reduce 0% 
13/07/04 00:28:47 INFO mapred.JobClient: map 12% reduce 0% 
13/07/04 00:32:17 INFO mapred.JobClient: map 13% reduce 0% 
13/07/04 00:37:34 INFO mapred.JobClient: map 14% reduce 0% 
13/07/04 00:44:30 INFO mapred.JobClient: map 15% reduce 0% 
13/07/04 00:54:28 INFO mapred.JobClient: map 16% reduce 0% 
13/07/04 01:16:30 INFO mapred.JobClient: map 17% reduce 0% 
13/07/04 01:32:05 INFO mapred.JobClient: map 8% reduce 0% 
13/07/04 01:32:08 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000000_2, Status : FAILED 
Error: GC overhead limit exceeded 
13/07/04 01:32:21 INFO mapred.JobClient: map 9% reduce 0% 
13/07/04 01:33:26 INFO mapred.JobClient: map 10% reduce 0% 
13/07/04 01:35:37 INFO mapred.JobClient: map 11% reduce 0% 
13/07/04 01:38:48 INFO mapred.JobClient: map 12% reduce 0% 
13/07/04 01:43:06 INFO mapred.JobClient: map 13% reduce 0% 
13/07/04 01:49:58 INFO mapred.JobClient: map 14% reduce 0% 
13/07/04 01:59:07 INFO mapred.JobClient: map 15% reduce 0% 
13/07/04 02:12:00 INFO mapred.JobClient: map 16% reduce 0% 
13/07/04 02:37:56 INFO mapred.JobClient: map 17% reduce 0% 
13/07/04 03:31:55 INFO mapred.JobClient: map 8% reduce 0% 
13/07/04 03:32:00 INFO mapred.JobClient: Job complete: job_201307031710_0001 
13/07/04 03:32:00 INFO mapred.JobClient: Counters: 7 
13/07/04 03:32:00 INFO mapred.JobClient: Job Counters 
13/07/04 03:32:00 INFO mapred.JobClient:  Failed map tasks=1 
13/07/04 03:32:00 INFO mapred.JobClient:  Launched map tasks=8 
13/07/04 03:32:00 INFO mapred.JobClient:  Data-local map tasks=8 
13/07/04 03:32:00 INFO mapred.JobClient:  Total time spent by all maps in occupied slots (ms)=11443502 
13/07/04 03:32:00 INFO mapred.JobClient:  Total time spent by all reduces in occupied slots (ms)=0 
13/07/04 03:32:00 INFO mapred.JobClient:  Total time spent by all maps waiting after reserving slots (ms)=0 
13/07/04 03:32:00 INFO mapred.JobClient:  Total time spent by all reduces waiting after reserving slots (ms)=0 
Exception in thread "main" java.lang.RuntimeException: java.lang.InterruptedException: Canopy Job failed processing vector 
+0

您的應用程序是否需要使用大量內存?如果沒有,那麼應用程序可能會在整個內存中出現一些錯誤。 – zsxwing

+0

它正在運行mahout canopy集羣,所以不應該是應用程序錯誤。我可以看到每個孩子的客戶端分配了大約200MB,這在我的情況下可能還不夠。 – Robin

+0

@zsxwing你應該把它寫成「-Xmx1024M」,正是因爲這個原因:你把一個太多的零放在那裏。那是10.24G –

回答

0

您需要更改Hadoop的內存設置,分配給Hadoop的內存是不夠的,以適應正在運行的工作要求,嘗試增加堆內存和驗證,由於對內存的使用操作系統可能會因爲哪項工作失敗而導致進程死機。

2

Mahout工作是非常內存密集型。我不知道映射器或縮減器是否是罪魁禍首,但是,無論哪種方式,您都必須告訴Hadoop爲他們提供更多內存。 「超出GC開銷限制」只是一種說「內存不足」的方式 - 意味着JVM放棄嘗試回收可用RAM的最後0.01%。

如何設置這個確實有點複雜,因爲有幾個屬性,它們在Hadoop 2中進行了更改.CDH4可以支持Hadoop 1或2 - 您使用哪一個?

如果我不得不猜測:設置mapreduce.child.java.opts-Xmx1g。但正確的答案真的取決於你的版本和你的數據。

相關問題