2016-08-05 96 views
0

我未能提交具有mem屬性的作業。由於我是新手,谷歌兩天後,我在這裏尋求幫助。任何建議將不勝感激!SGE faild提交作業,屬性不是內存值

以下是我所做的:

\ 1。提交我的稿子:

qsub -S /bin/bash -A assembly -pe threads 16 -l mem=2GB -cwd -N "pBcR_correct_asm" -j y -o /dev/null runCorrection.sh 

Unable to run job: unknown resource "mem". 
Exiting. 

\ 2。考慮到我已經將「h」替換爲「主機」,根據SGE unknown resource "nodes"解決了我的問題,我將「m」替換爲「mem」,並且它不起作用。

\ 3.在google之後,我知道「h」 爲在 「/選擇/ gridengine/util的/資源/森特里/ 主機名」 中定義的快捷方式,並且可以與 「的qconf -sc」 予以確認:

qconf -sc 

#name    shortcut type  relop requestable consumable default urgency 
#---------------------------------------------------------------------------------------- 
arch    a   RESTRING == YES   NO   NONE  0 
calendar   c   RESTRING == YES   NO   NONE  0 
cpu     cpu  DOUBLE  >= YES   NO   0  0 
display_win_gui  dwg  BOOL  == YES   NO   0  0 
h_core    h_core  MEMORY  <= YES   NO   0  0 
h_cpu    h_cpu  TIME  <= YES   NO   0:0:0 0 
h_data    h_data  MEMORY  <= YES   NO   0  0 
h_fsize    h_fsize MEMORY  <= YES   NO   0  0 
h_rss    h_rss  MEMORY  <= YES   NO   0  0 
h_rt    h_rt  TIME  <= YES   NO   0:0:0 0 
h_stack    h_stack MEMORY  <= YES   NO   0  0 
h_vmem    h_vmem  MEMORY  <= YES   NO   0  0 
hostname   h   HOST  == YES   NO   NONE  0 
load_avg   la   DOUBLE  >= NO   NO   0  0 
load_long   ll   DOUBLE  >= NO   NO   0  0 
load_medium   lm   DOUBLE  >= NO   NO   0  0 
load_short   ls   DOUBLE  >= NO   NO   0  0 
m_core    core  INT   <= YES   NO   0  0 
m_socket   socket  INT   <= YES   NO   0  0 
m_topology   topo  RESTRING == YES   NO   NONE  0 
m_topology_inuse utopo  RESTRING == YES   NO   NONE  0 
mem_free   mf   MEMORY  <= YES   NO   0  0 
mem_total   mt   MEMORY  <= YES   NO   0  0 
mem_used   mu   MEMORY  >= YES   NO   0  0 

\ 4因此我代替。 「mt」改爲「mem」,但是它抱怨屬性問題,根據上面的輸出,似乎mem_total與之前的「hostname」幾乎相同,那麼我認爲jsv在通過SGE指南後可能會出現問題,但我找不到任何腳本包含「無法運行作業:屬性......」 ,其中「/ opt/gridengine/util/resources/jsv」的負責人。我想我必須配置一些文件,但是這些文件是什麼,我該怎麼辦?

qsub -S /bin/bash -A assembly -pe threads 16 -l mt=2GB -cwd -N "pBcR_correct_asm" -j y -o test.out runCorrection.sh 

Unable to run job: attribute "mem_total" is not a memory value. 
Exiting. 

回答

0

你可能想要的是h_vmem。至少這是我總是用來指​​定我想要的工作請求的內存的屬性。

參見:

http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html?pathrev=V62u5_TAG

具體來說,

 The resource limit parameters s_vmem and h_vmem are imple- 
    mented by Sun Grid Engine as a job limit. They impose a 
    limit on the amount of combined virtual memory consumed by 
    all the processes in the job. If h_vmem is exceeded by a job 
    running in the queue, it is aborted via a SIGKILL signal 
    (see kill(1)). If s_vmem is exceeded, the job is sent a 
    SIGXCPU signal which can be caught by the job. If you wish 
    to allow a job to be "warned" so it can exit gracefully 
    before it is killed then you should set the s_vmem limit to 
    a lower value than h_vmem. For parallel processes, the 
    limit is applied per slot which means that the limit is mul- 
    tiplied by the number of slots being used by the job before 
    being applied. 

此外,您可能需要設置爲消耗使用qconf

1

@Vince!

非常感謝您的回覆。

最後我解決了我的問題,通過使用「h_vmem = 2g」(「2GB」會給出錯誤),但我不知道在哪裏可以找到如何設計複雜值(MEMORY)的值。

以下信息現在是不必要的。

我已經讀過你給的網站,並將h_vmem和s_vmeme的屬性複雜地配置爲「可使用」,但它不起作用。我想我必須配置當前「無」隊列的「complex_value」。但是,我無法打開可能會告訴我如何配置的網絡http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_types.html?pathrev=V62u5_TAG。我有權配置隊列嗎?我是否也必須配置主機?

任何建議將不勝感激!

Fowllowing是我做過的:

\ 1。將h_vmem和s_vmem的消耗品屬性更改爲「YES」:

qconf -sc 

#name    shortcut type  relop requestable consumable default urgency 
#---------------------------------------------------------------------------------------- 
arch    a   RESTRING == YES   NO   NONE  0 
calendar   c   RESTRING == YES   NO   NONE  0 
cpu     cpu  DOUBLE  >= YES   NO   0  0 
display_win_gui  dwg  BOOL  == YES   NO   0  0 
h_core    h_core  MEMORY  <= YES   NO   0  0 
h_cpu    h_cpu  TIME  <= YES   NO   0:0:0 0 
h_data    h_data  MEMORY  <= YES   NO   0  0 
h_fsize    h_fsize MEMORY  <= YES   NO   0  0 
h_rss    h_rss  MEMORY  <= YES   NO   0  0 
h_rt    h_rt  TIME  <= YES   NO   0:0:0 0 
h_stack    h_stack MEMORY  <= YES   NO   0  0 
h_vmem    h_vmem  MEMORY  <= YES   YES  0  0 
hostname   h   HOST  == YES   NO   NONE  0 
load_avg   la   DOUBLE  >= NO   NO   0  0 
load_long   ll   DOUBLE  >= NO   NO   0  0 
load_medium   lm   DOUBLE  >= NO   NO   0  0 
load_short   ls   DOUBLE  >= NO   NO   0  0 
m_core    core  INT   <= YES   NO   0  0 
m_socket   socket  INT   <= YES   NO   0  0 
m_topology   topo  RESTRING == YES   NO   NONE  0 
m_topology_inuse utopo  RESTRING == YES   NO   NONE  0 
mem_free   mf   MEMORY  <= YES   NO   0  0 
mem_total   mt   MEMORY  <= YES   NO   0  0 
mem_used   mu   MEMORY  >= YES   NO   0  0 
min_cpu_interval mci  TIME  <= NO   NO   0:0:0 0 
np_load_avg   nla  DOUBLE  >= NO   NO   0  0 
np_load_long  nll  DOUBLE  >= NO   NO   0  0 
np_load_medium  nlm  DOUBLE  >= NO   NO   0  0 
np_load_short  nls  DOUBLE  >= NO   NO   0  0 
num_proc   p   INT   == YES   NO   0  0 
qname    q   RESTRING == YES   NO   NONE  0 
rerun    re   BOOL  == NO   NO   0  0 
s_core    s_core  MEMORY  <= YES   NO   0  0 
s_cpu    s_cpu  TIME  <= YES   NO   0:0:0 0 
s_data    s_data  MEMORY  <= YES   NO   0  0 
s_fsize    s_fsize MEMORY  <= YES   NO   0  0 
s_rss    s_rss  MEMORY  <= YES   NO   0  0 
s_rt    s_rt  TIME  <= YES   NO   0:0:0 0 
s_stack    s_stack MEMORY  <= YES   NO   0  0 
s_vmem    s_vmem  MEMORY  <= YES   YES  0  0 
seq_no    seq  INT   == NO   NO   0  0 
slots    s   INT   <= YES   YES  1  1000 
swap_free   sf   MEMORY  <= YES   NO   0  0 
swap_rate   sr   MEMORY  >= YES   NO   0  0 
swap_rsvd   srsv  MEMORY  >= YES   NO   0  0 
swap_total   st   MEMORY  <= YES   NO   0  0 
swap_used   su   MEMORY  >= YES   NO   0  0 
tmpdir    tmp  RESTRING == NO   NO   NONE  0 
virtual_free  vf   MEMORY  <= YES   NO   0  0 
virtual_total  vt   MEMORY  <= YES   NO   0  0 
virtual_used  vu   MEMORY  >= YES   NO   0  0 
# >#< starts a comment but comments are not saved across edits -------- 

\ 2。將我的作業提交到smp隊列。q,它抱怨同樣的問題:

qsub -S /bin/bash -A assembly -q smp.q -pe newPe 16 -l h_vmem=2GB -cwd -N "pBcR_correct_asm" -j y -o runCorrection.sh 

Unable to run job: attribute "h_vmem" is not a memory value. 
Exiting. 

\ 3。 smp.q.的信息我認爲「complex_values」應該改變,「h_vmem」可以保持不變:

qconf -sq smp.q 

qname     smp.q 
hostlist    @smp.q 
seq_no    0 
load_thresholds  np_load_avg=1.75 
suspend_thresholds NONE 
nsuspend    1 
suspend_interval  00:05:00 
priority    0 
min_cpu_interval  00:05:00 
processors   UNDEFINED 
qtype     BATCH INTERACTIVE 
ckpt_list    NONE 
pe_list    make newPe 
rerun     FALSE 
slots     160 
tmpdir    /tmp 
shell     /bin/csh 
prolog    NONE 
epilog    NONE 
shell_start_mode  posix_compliant 
starter_method  NONE 
suspend_method  NONE 
resume_method   NONE 
terminate_method  NONE 
notify    00:00:60 
owner_list   NONE 
user_lists   NONE 
xuser_lists   NONE 
subordinate_list  NONE 
complex_values  NONE 
projects    NONE 
xprojects    NONE 
calendar    NONE 
initial_state   default 
s_rt     INFINITY 
h_rt     INFINITY 
s_cpu     INFINITY 
h_cpu     INFINITY 
s_fsize    INFINITY 
h_fsize    INFINITY 
s_data    INFINITY 
h_data    INFINITY 
s_stack    INFINITY 
h_stack    INFINITY 
s_core    INFINITY 
h_core    INFINITY 
s_rss     INFINITY 
h_rss     INFINITY 
s_vmem    INFINITY 
h_vmem    INFINITY 

\ 4。 @ smp.q中的主機信息:

qconf -sconf smp03.local 

#smp03.local: 
mailer      /bin/mail 
xterm      /usr/bin/X11/xterm 
execd_spool_dir    /opt/gridengine/default/spool 

\ 5。全球信息。我在這裏添加h_vmem和s_vmem嗎?

qconf -sconf 

#global: 
execd_spool_dir    /opt/gridengine/default/spool 
mailer      /bin/mail 
xterm      /usr/bin/X11/xterm 
load_sensor     none 
prolog      none 
epilog      none 
shell_start_mode    posix_compliant 
login_shells     sh,ksh,csh,tcsh 
min_uid      0 
min_gid      0 
user_lists     none 
xuser_lists     none 
projects      none 
xprojects     none 
enforce_project    false 
enforce_user     auto 
load_report_time    00:00:40 
max_unheard     00:05:00 
reschedule_unknown   00:00:00 
loglevel      log_warning 
administrator_mail   none 
set_token_cmd    none 
pag_cmd      none 
token_extend_time   none 
shepherd_cmd     none 
qmaster_params    none 
execd_params     ENABLE_ADDGRP_KILL=TRUE H_MEMORYLOCKED=infinity 
reporting_params    accounting=true reporting=true \ 
          flush_time=00:00:15 joblog=true sharelog=00:00:00 
finished_jobs    100 
gid_range     20000-20100 
qlogin_command    builtin 
qlogin_daemon    builtin 
rlogin_command    builtin 
rlogin_daemon    builtin 
rsh_command     builtin 
rsh_daemon     builtin 
max_aj_instances    2000 
max_aj_tasks     75000 
max_u_jobs     0 
max_jobs      0 
max_advance_reservations  0 
auto_user_oticket   0 
auto_user_fshare    0 
auto_user_default_project none 
auto_user_delete_time  86400 
delegated_file_staging  false 
reprioritize     0 
jsv_url      none 
jsv_allowed_mod    ac,h,i,e,o,j,M,N,p,w 
+0

我想我知道我爲什麼失敗。它似乎是h_vmem不配置全局,那就是我必須「qconf -mconf global」並添加「h_vmem 1024M」。但是當管理員不在時我無法測試它。如果有效,我會在這裏發佈解決方案。 – lam138138