2011-07-09 68 views
5

我想在4個節點上運行一個簡單的MPI程序。我正在使用在Centos 5.5上運行的OpenMPI 1.4.3。當我提交MPIRUN命令與hostfile/machinefile時,我沒有輸出,收到一個空白屏幕。因此,我不得不殺了這份工作。OpenMPI 1.4.3 mpirun主機文件錯誤

我用下面的運行命令:的mpirun --hostfile HOSTFILE -np 4 new46

OUTPUT ON KILLING JOB: 
mpirun: killing job... 
-------------------------------------------------------------------------- 
    mpirun noticed that the job aborted, but has no info as to the process that caused 
    that situation. 
    -------------------------------------------------------------------------- 
    mpirun was unable to cleanly terminate the daemons on the nodes shown 
    below. Additional manual cleanup may be required - please refer to 
    the "orte-clean" tool for assistance. 
    -------------------------------------------------------------------------- 
    myocyte46 - daemon did not report back when launched 
    myocyte47 - daemon did not report back when launched 
    myocyte49 - daemon did not report back when launched 

這裏是MPI程序,我試圖在4個節點

************************** 

    if (my_rank != 0) 
    { 
    sprintf(message, "Greetings from the process %d!", my_rank); 
    dest = 0; 
    MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); 
    } 
    else 
    { 
    for (source = 1;source < p; source++) 
    { 
    MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); 
    printf("%s\n", message); 
    } 

    **************************** 

我HOSTFILE執行看起來是這樣的:

[[email protected] ~]$ cat hostfile 
    myocyte46 
    myocyte47 
    myocyte48 
    myocyte49 
    ******************************* 

我獨立運行在每個節點上MPI程序,並將其組合物1帶領和跑得很好。當我使用hostfile時,我有這個問題的「守護進程沒有報告當推出」。我想弄清楚可能是什麼問題。

謝謝!

回答

1

我覺得這些線

myocyte46 - daemon did not report back when launched 

是相當明確的 - 你遇到麻煩或者啓動MPI守護程序或事後與他們溝通。所以你需要開始關注網絡。你可以沒有密碼ssh到這些節點?你可以退回嗎?撇開MPI計劃,你能

mpirun -np 4 hostname 

並得到什麼?

+0

謝謝。是的,我可以在節點之間來回切換。它看起來像管理員已經離開防火牆運行,並似乎工作。另外,我在一些linux論壇上提到他們建議增加。/etc/bashrc作爲bashrc配置文件中的第一項。 – Ashmohan