2017-03-31 33 views
0

我想在SGE上運行一項工作,但它一直在被殺害。我不確定我應該在腳本中更改哪個參數。SGE工作遇難

我submit.sh腳本:

===========

#$ -l mem_free=32G 
#$ -l h_rt=48:00:00 

## softx will require 8 processors 
softx myprogram.sh 

==========

,我提交到SGE:

qsub -q long.q submit.sh 

我應該改變什麼?

,並在殺死作業的詳細信息隊列默認值都低於

qacct -j 740 

=========================== ===================================

qname  long.q 
hostname  node02.local 
department defaultdepartment 
jobname  submit.sh 
jobnumber 740 
taskid  undefined 
account  sge 
priority  0 

granted_pe NONE 
slots  1 
failed  37 : qmaster enforced h_rt, h_cpu, or h_vmem limit 
exit_status 137     (Killed) 
ru_wallclock 1588s 
ru_utime  0.110s 
ru_stime  0.190s 
ru_maxrss 5.520KB 
ru_ixrss  0.000B 
ru_ismrss 0.000B 
ru_idrss  0.000B 
ru_isrss  0.000B 
ru_minflt 25267 
ru_majflt 0 
ru_nswap  0 
ru_inblock 0 
ru_oublock 176 
ru_msgsnd 0 
ru_msgrcv 0 
ru_nsignals 0 
ru_nvcsw  351 
ru_nivcsw 95 
cpu   10096.930s 
mem   429.730GBs 
io   76.911GB 
iow   0.000s 
maxvmem  8.635GB 
arid   undefined 
ar_sub_time undefined 
ar_sub_time undefined 

category  -q long.q -l h_rt=172800,mem_free=32G 

=====

qconf -sq long.q 

qname     long.q 
s_rt     864000 
h_rt     864000 
s_cpu     INFINITY 
h_cpu     INFINITY 
s_fsize    INFINITY 
h_fsize    INFINITY 
s_data    INFINITY 
h_data    INFINITY 
s_rss     INFINITY 
h_rss     INFINITY 
s_vmem    INFINITY 
h_vmem    8g 
+0

您是否嘗試過提交作業而未指定h_rt? – crcrewso

回答

0

該隊列具有8g的h_vmem限制,它對作業執行,而不管他們請求什麼。由於該作業在半小時後死亡,因此不應該是h_rt限制。作業報告超出隊列限制的max_vmem。您需要與羣集管理員討論如何提交此類作業或更改問題,以便使用較少的虛擬內存。