2015-05-16 112 views
3

我有許多從屬節點,它們可能會也可能不會向主節點發送消息。所以目前主節點沒有辦法知道有多少MPI_Recv。出於效率原因,從節點必須向主節點發送最少數量的消息。MPI:當預期的MPI_Recv數量未知時該怎麼辦

我設法找到a cool trick,當它不再期待任何消息時,它會發送額外的「完成」消息。不幸的是,在我的情況下,這似乎不起作用,發件人數量可變。有關如何解決這個問題的任何想法?謝謝!

if(rank == 0){ //MASTER NODE 

    while (1) { 

     MPI_Recv(&buffer, 10, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status); 

     if (status.MPI_TAG == DONE) break; 


     /* Do stuff */ 
    } 

}else{ //MANY SLAVE NODES 

    if(some conditions){ 
     MPI_Send(&buffer, 64, MPI_INT, root, 1, MPI_COMM_WORLD); 
    } 

} 


MPI_Barrier(MPI_COMM_WORLD); 
MPI_Send(NULL, 1, MPI_INT, root, DONE, MPI_COMM_WORLD); 

不工作,該計劃似乎仍在等待MPI_RECV

+0

剛剛試了一下,我覺得正在執行MPI_Barrier但'之後從不打印barrier'消息,因爲程序停留在MPI_RECV – kornesh

+0

等級0獲得永不調用屏障顯然是掛起來的。刪除障礙,它會運行。 – Jeff

+0

它實際上沒有暫停運行,但它然後執行MPI_Send(NULL,1,MPI_INT,root,DONE,MPI_COMM_WORLD);'在第一個從節點完成計算之後......沒有等待其他從節點 – kornesh

回答

0

1 - 你在錯誤的地方叫MPI_Barrier,應該MPI_Send後調用。
2-當從其他所有行(大小爲-1)接收到DONE時,根將退出循環。

代碼進行一些修改後:

#include <mpi.h> 
#include <stdlib.h> 
#include <stdio.h> 

int main(int argc, char** argv) 
{ 

    MPI_Init(NULL, NULL); 
    int size; 
    MPI_Comm_size(MPI_COMM_WORLD, &size); 
    int rank; 
    MPI_Comm_rank(MPI_COMM_WORLD, &rank); 
    MPI_Status status; 
    int DONE = 888; 
    int buffer = 77; 
    int root = 0 ; 
    printf("here is rank %d with size=%d\n" , rank , size);fflush(stdout); 
    int num_of_DONE = 0 ; 
if(rank == 0){ //MASTER NODE 


    while (1) { 

     MPI_Recv(&buffer, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status); 
     printf("root recev %d from %d with tag = %d\n" , buffer , status.MPI_SOURCE , status.MPI_TAG);fflush(stdout); 

     if (status.MPI_TAG == DONE) 
     num_of_DONE++; 
    printf("num_of_DONE=%d\n" , num_of_DONE);fflush(stdout); 
    if(num_of_DONE == size -1) 
     break; 



     /* Do stuff */ 
    } 

}else{ //MANY SLAVE NODES 

    if(1){ 
     buffer = 66; 
     MPI_Send(&buffer, 1, MPI_INT, root, 1, MPI_COMM_WORLD); 
     printf("rank %d sent data.\n" , rank);fflush(stdout); 
    } 

} 

    if(rank != 0) 
    { 
     buffer = 55; 
     MPI_Send(&buffer, 1, MPI_INT, root, DONE, MPI_COMM_WORLD); 
    } 


    MPI_Barrier(MPI_COMM_WORLD); 
    printf("rank %d done.\n" , rank);fflush(stdout); 
    MPI_Finalize(); 
    return 0; 
} 

輸出:

[email protected]:~/Desktop$ mpicc -o aa aa.c 
    [email protected]:~/Desktop$ mpirun -n 3 ./aa 
here is rank 2 with size=3 
here is rank 0 with size=3 
rank 2 sent data. 
here is rank 1 with size=3 
rank 1 sent data. 
root recev 66 from 1 with tag = 1 
num_of_DONE=0 
root recev 66 from 2 with tag = 1 
num_of_DONE=0 
root recev 55 from 2 with tag = 888 
num_of_DONE=1 
root recev 55 from 1 with tag = 888 
num_of_DONE=2 
rank 0 done. 
rank 1 done. 
rank 2 done. 
+0

我剛剛嘗試過你的方法,不幸的是它比它應該的更早結束。每個從屬節點都有自己的時間來完成他們的計算,而最後一個節點並不意味着它應該最後終止。我認爲,如果最後一道工序的計算時間不到1秒鐘,睡覺(1)'會起作用。 – kornesh

+0

@kornesh:我修改了答案。 – houssam

+0

這是一個優雅的方式,但_Slave節點必須發送最少數量的消息到主節點出於效率原因_ – kornesh

2

更簡單,更優雅的選擇是使用MPI_IBARRIER。讓每個工作人員調用所有需要的發送,然後在完成時調用MPI_IBARRIER。在主設備上,您可以在MPI_IRECVMPI_IBARRIER上循環使用MPI_ANY_SOURCEMPI_IBARRIER。當MPI_IBARRIER完成後,您知道每個人都已完成,您可以取消MPI_IRECV並繼續。僞代碼會是這個樣子:

if (master) { 
    /* Start the barrier. Each process will join when it's done. */ 
    MPI_Ibarrier(MPI_COMM_WORLD, &requests[0]); 

    do { 
    /* Do the work */ 
    MPI_Irecv(..., MPI_ANY_SOURCE, &requests[1]); 

    /* If the index that finished is 1, we received a message. 
    * Otherwise, we finished the barrier and we're done. */ 
    MPI_Waitany(2, requests, &index, MPI_STATUSES_IGNORE); 
    } while (index == 1); 

    /* If we're done, we should cancel the receive request and move on. */ 
    MPI_Cancel(&requests[1]); 
} else { 
    /* Keep sending work back to the master until we're done. */ 
    while(...work is to be done...) { 
    MPI_Send(...); 
    } 

    /* When we finish, join the Ibarrier. Note that 
    * you can't use an MPI_Barrier here because it 
    * has to match with the MPI_Ibarrier above. */ 
    MPI_Ibarrier(MPI_COMM_WORLD, &request); 
    MPI_Wait(&request, MPI_STATUS_IGNORE); 
} 
相關問題