MPI Fox的算法非阻塞發送和接收

我是MPI的新手，我試圖編寫Fox的算法（AxB = C，其中A和B是維度爲nxn的矩陣）的實現。我的程序工作正常，但我想看看是否可以通過在矩陣B中的塊移位期間重疊通信來重新加速通信，以計算產品矩陣（B的塊矩陣在算法）。根據算法，2D笛卡爾網格中的每個進程都有來自矩陣A，B和C的塊。我現在有是這樣的，這是福克斯的算法MPI Fox的算法非阻塞發送和接收

if (stage > 0){ 


    //shifting b values in all proccess 

    MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
    MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1); 
    MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2);       
    MPI_Wait(&my_request1, &status); 
    MPI_Wait(&my_request2, &status); 
    multiplyMatrix(a_temp,b,c,n_local); 
}

子矩陣a_temp，B內，b_temp是double類型是指向塊的指針N/NUMPROCESS * N/numprocesses（這是塊的大小矩陣例如b =（double *）calloc（n/numprocess * n/numprocesses，sizeof（double）））。

我想在MPI_Wait調用之前有multiplyMatrix函數（這會構成通信和計算的重疊），但我不知道該怎麼做。我需要兩個獨立的緩衝區並在不同階段交替使用它們嗎？

（我知道我可以使用MPI_Sendrecv_replace但這並不具有重疊的，因爲它使用阻塞發送和接收的幫助。這同樣適用於MPI_Sendrecv）

來源

2013-03-19 YsK

我居然想出如何做到這一點。這個問題應該可能被刪除。但是因爲我是MPI的新手，我會在這裏發佈這些解決方案，如果有人有改進建議，我會很高興，如果他們分享。方法1：

// Fox's algorithm 
double * b_buffers[2]; 
b_buffers[0] = (double *) malloc(n_local*n_local*sizeof(double)); 
b_buffers[1] = b; 
for (stage =0;stage < q; stage++){ 
     //copying a into a_temp and Broadcasting a_temp of each proccess to all other proccess in its row 
     for (i=0;i< n_local*n_local; i++) 
      a_temp[i]=a[i]; 
     if (stage == 0) { 
      MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
      multiplyMatrix(a_temp,b,c,n_local); 
      MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);  
      MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2); 
      MPI_Wait(&my_request2, &status); 
      MPI_Wait(&my_request1, &status); 
     } 


     if (stage > 0) 
     {   
      //shifting b values in all procces 
      MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
      MPI_Isend(b_buffers[(stage)%2], n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);  
      MPI_Irecv(b_buffers[(stage+1)%2], n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2); 
       multiplyMatrix(a_temp, b_buffers[(stage)%2], c, n_local);   
      MPI_Wait(&my_request2, &status); 
      MPI_Wait(&my_request1, &status); 

    }  
}

方法2：

// Fox's algorithm 

for (stage =0;stage < q; stage++){ 
     //copying a into a_temp and Broadcasting a_temp of each proccess to all other proccess in its row 
     for (i=0;i< n_local*n_local; i++) 
      a_temp[i]=a[i]; 
     if (stage == 0) { 
      MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
      multiplyMatrix(a_temp,b,c,n_local); 
      MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1);  
      MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2); 
      MPI_Wait(&my_request2, &status); 
      MPI_Wait(&my_request1, &status); 
     } 


     if (stage > 0) 
     {   
      //shifting b values in all proccess 
      memcpy(b_temp, b, n_local*n_local*sizeof(double)); 
       MPI_Bcast(a_temp, n_local*n_local, MPI_DOUBLE, (rowID + stage) % q , row_comm); 
      MPI_Isend(b, n_local*n_local, MPI_DOUBLE, nbrs[UP], 111, grid_comm,&my_request1); 
       MPI_Irecv(b, n_local*n_local, MPI_DOUBLE, nbrs[DOWN], 111, grid_comm,&my_request2); 
       multiplyMatrix(a_temp, b_temp, c, n_local);   
       MPI_Wait(&my_request2, &status); 
       MPI_Wait(&my_request1, &status); 

    }

這兩個似乎工作，但我說我是新來的MPI，如果您有任何意見或建議，請分享。

來源

2013-03-19 17:43:06 YsK

如果你不使用'status'，那麼你可以在一行中使用MPI_STATUS_IGNORE' – 2017-08-06 10:22:52

而不是2'MPI_Wait（）'，你可以使用一個請求數組，並且可以使用'MPI_Waitall（）' 'MPI_STATUSES_IGNORE'，如果你不關心狀態。 – 2017-08-06 10:24:09

MPI Fox的算法非阻塞發送和接收

回答

相關問題