2017-06-14 35 views
1

我在我的羣集上使用PBS作業調度程序。在bash,我想監視作業狀態,一旦工作完成,我想複製結果到 特定位置(/數據/ MyFolder中/)grep qstat輸出和複製文件一旦完成

我qstat命令輸出如下:

JobID Username Queue Jobname SessID NDS TSK Memory Time Status 
    ---------------------------------------------------------------- 
    717.XXXXXX user XXXX  SS 2323283 1 24 122gb --  E 

在此先感謝

回答

1

有一個腳本here這樣做(對於SGE)。我開始僅爲您摘錄相關部分,但您可能更容易從完整腳本開始,只需在submit_job函數中插入qsub命令,然後將代碼複製到wait_job_finish之後命令在腳本中。如果需要,可以在最後刪除日誌打印。

#!/bin/bash 

# this script will submit a qsub job and check on host information for the cluster 
# node which it ends up running on 
# ~~~~~ CUSTOM FUNCTIONS ~~~~~ # 
submit_job() { 
    local job_name="$1" 
    qsub -j y -N "$job_name" -o :${PWD}/ -e :${PWD}/ <<E0F 
set -x 
hostname 
cat /etc/hosts 
python -c "import socket; print socket.gethostbyname(socket.gethostname())" 
# sleep 5000 
E0F 
} 

wait_job_start() { 
    local job_id="$1" 
    printf "waiting for job to start" 
    while ! qstat | grep "$job_id" | grep -Eq '[[:space:]]r[[:space:]]' 
    do 
     printf "." 
     sleep 1 
    done 
    printf "\n\n" 

    local node_name="$(get_node_name "$job_id")" 
    printf "Job is running on node $node_name \n\n" 
} 

wait_job_finish() { 
    local job_id="$1" 
    printf "waiting for job to finish" 
    while qstat | grep -q "$job_id" 
    do 
     printf "." 
     sleep 1 
    done 
    printf "\n\n" 
} 

check_for_job_submission() { 
    local job_id="$1" 
    if ! qstat | grep -q "$job_id" ; then 
     echo "its there" 
    else 
     echo "not there" 
    fi 
} 

get_node_name() { 
    local job_id="$1" 
    qstat | grep "$job_id" | sed -e 's|^.*[[:space:]]\([a-zA-Z0-9.]*@[^ ]*\).*$|\1|g' 
} 
# ~~~~~ RUN ~~~~~ # 
printf "Submitting cluster job to get node hostname and IP\n\n" 

job_name="get_node_hostnames" 
job_id="$(submit_job "$job_name")" # Your job 832606 ("get_node_hostnames") has been submitted 
job_id="$(echo "$job_id" | sed -e 's|.*[[:space:]]\([[:digit:]]*\)[[:space:]].*|\1|g')" 
job_stdout_log="${job_name}.o${job_id}" 

printf "Job ID:\t%s\nJob Name:\t%s\n\n" "$job_id" "$job_name" 

wait_job_start "$job_id" 
wait_job_finish "$job_id" 

printf "\n\nReading log file ${job_stdout_log}\n\n" 
[ -f "$job_stdout_log" ] && cat "$job_stdout_log" 
printf "\n\nRemoving log file ${job_stdout_log}\n\n" 
[ -f "$job_stdout_log" ] && rm -f "$job_stdout_log" 

旁註:如果你像Python,有一個稍微更穩健相當於here

你可能不得不做一些小的調整既要調整它的PBS系統,因爲這是寫爲SGE。

1

你可以隨便找" C "使用grep,但你也可以只使用-o [hostname:]path流式傳輸至最終目的地,只要你有你的SSH密鑰從節點爲您的POSIX帳戶。

如果你最終做了grep,你應該是一個好公民,並且每分鐘限制一次或兩次檢查頻率,以免影響服務器垃圾郵件,從而影響性能。