當mpi發送和接收被放入一個循環時Mpirun掛起

我試圖在使用mpirun的4節點羣集上運行給定的程序。當mpi發送和接收被放入一個循環時Mpirun掛起

節點0被數據分配到節點1，2和3 在程序中，計算必須完成對變量「DIR」的不同的值，範圍從-90到90

所以節點0分發數據並以循環方式收集結果（針對var'dir的不同值）。當給出do {*******}while(dir<=90);循環時，mpirun掛起，並且沒有輸出。但是，當我評論do {*******}while(dir<=90);循環輸出獲得變量dir，（dir=-90）的初始化值，並且該輸出是正確的。循環中給出問題。

任何人都可以請幫我解決這個問題。

#include "mpi.h" 
    int main(int argc,char *argv[]) 
    float dir=-90; 
    int rank,numprocs; 
MPI_Status status; 
MPI_Init(&argc,&argv); 
MPI_Comm_rank(MPI_COMM_WORLD,&rank); 
MPI_Comm_size(MPI_COMM_WORLD,&numprocs); 
    if(rank==0) 
{ 
     do{ 

    /*initializing data*/ 
    for(dest=1;dest<numprocs;dest++) 
    { 

        MPI_Send(&offset,1,MPI_INT,dest,FROM_MASTER,MPI_COMM_WORLD);    

    MPI_Send(&s_psi[offset],count,MPI_FLOAT,dest,FROM_MASTER,MPI_COMM_WORLD); 

    } 
    gettimeofday(&start,NULL); 
    for (dest=1; dest<numprocs; dest++) 
    { 
     MPI_Recv(&offset,1,MPI_INT,dest,FROM_WORKER,MPI_COMM_WORLD,&status);    
      MPI_Recv(&P[offset],count,MPI_FLOAT,dest,FROM_WORKER,MPI_COMM_WORLD,&status); 
    } 

    gettimeofday(&end,NULL); 
    timersub(&end,&start,&total); 
    printf("time consumed=%ds %dus\n",total.tv_sec,total.tv_usec); 
    dir++; 
    }while(dir<=90); 
    } 


    if(rank>0) 
{ 
    MPI_Recv(&offset,1,MPI_INT,0,FROM_MASTER,MPI_COMM_WORLD,&status);   

    MPI_Recv(&s_psi[offset],count,MPI_FLOAT,0,FROM_MASTER,MPI_COMM_WORLD,&status); 

    //Does the computation  
    } 
    MPI_Send(&offset,1,MPI_INT,0,FROM_WORKER,MPI_COMM_WORLD); 

     MPI_Send(&P[offset],count,MPI_FLOAT,0,FROM_WORKER,MPI_COMM_WORLD); 
} 
MPI_Finalize(); 
return 0; 
    }

來源

2014-02-17 user3115828

rank > 0應該包含在一個循環中的部分。每個MPI_Send應該有其相應的MPI_Recv。

if(rank>0) { 
    do { 
     MPI_Recv(&offset,1,MPI_INT,0,FROM_MASTER,MPI_COMM_WORLD,&status);   
     MPI_Recv(&s_psi[offset],count,MPI_FLOAT,0,FROM_MASTER,MPI_COMM_WORLD,&status); 
     // Computation  
     MPI_Send(&offset,1,MPI_INT,0,FROM_WORKER,MPI_COMM_WORLD); 
     MPI_Send(&P[offset],count,MPI_FLOAT,0,FROM_WORKER,MPI_COMM_WORLD); 
     dir++; 
    } while(dir <= 90); 
}

在你工作節點

但你可能不知道dir。通常，我們node0發送一個魔術包來結束工作人員。

在NODE0結束：

for(r = 1; r < numprocs; r++) 
    MPI_Send(&dummy, 1, MPI_INT, r, STOP, COMM);

爲woker節點：

if(rank>0) { 
    while(true) { 
     MPI_Recv(&offset,1,MPI_INT,0,FROM_MASTER,MPI_COMM_WORLD,&status);   
     MPI_Recv(&s_psi[offset],count,MPI_FLOAT,0,FROM_MASTER,MPI_COMM_WORLD,&status); 
     // Computation  
     MPI_Send(&offset,1,MPI_INT,0,FROM_WORKER,MPI_COMM_WORLD); 
     MPI_Send(&P[offset],count,MPI_FLOAT,0,FROM_WORKER,MPI_COMM_WORLD); 

     if(MPI_Iprobe(ANY_SOURCE, STOP, COMM, &flag, &status)) { 
      MPI_Recv(&dummy, 1, MPI_INT, ANY_SOURCE, STOP, COMM, NO_STATUS); 
      break; 
     } 
    }; 
}

你終於可以MPI_finalize

順便說一句，你可能想看看在攔網和不結塊發送/ Recv。

來源

2014-02-17 08:11:49 NoWiS

謝謝。實際上，我必須在4節點單核PowerPC羣集中運行此應用程序。在應用程序在pc上運行時，在工作節點中添加'do .. while'循環解決了問題。但是在基於PowerPC的羣集上運行時，mpirun仍然掛起。使用命令「mpirun --prefix/usr/local --hostfile/usr/local/etc/openmpi-default-hostfile -np 4 pgm」在集羣上執行代碼。這可能是什麼原因？ – user3115828

我編輯了我的答案，有一個小問題（我忘了增加'dir'）。無論如何，因爲它在你的電腦上工作，我認爲你糾正它。關於powerpc集羣上的問題，我不知道。您可能應該在每個循環中添加'printf（「[％d]％d \ n」，rank，dir）'以查看它掛起的位置，以及Send沒有對應的Recv。 – NoWiS

當mpi發送和接收被放入一個循環時Mpirun掛起

回答

相關問題