這是我的代碼:簡單的MPI程序失敗,大量的進程
#include "mpi.h"
#include <stdio.h>
int main (int argc, char** argv) {
int numtasks, rank;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numtasks);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
// the code fail with or without printf
printf ("Number of tasks= %d My rank= %d\n", numtasks,rank);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
這就是我如何運行它,並輸出:
$ mpirun -n 160 ./mpi_example1
[proxy:0:[email protected]] send_cmd_downstream (./pm/pmiserv/pmip_pmi_v1.c:80): assert (!closed) failed
[proxy:0:[email protected]] fn_get (./pm/pmiserv/pmip_pmi_v1.c:349): error sending PMI response
[proxy:0:[email protected]] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler returned error
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
[[email protected]] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed
[[email protected]] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[[email protected]] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[[email protected]] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
當我運行-n代碼128或更低,它工作正常。我也嘗試在32核心x 8節點計算機上運行代碼,並且能夠運行高達-n 192,當我嘗試-n 224時它失敗...
任何建議?謝謝。
我相信,你的根進程(等級0)在所有其他人都正常開始之前退出。在MPI_Finalize();之前添加'MPI_Barrier(MPI_COMM_WORLD);'應該修復它,如果是這樣的話。你能試試嗎? –
謝謝我會嘗試並儘快回覆! – Phuocdh90
@Nominal動物可悲的是,我嘗試了你所建議的方法,但同樣的錯誤發生...... :( – Phuocdh90