2016-11-10 99 views
1

我正在使用MPI並行地乘以兩個矩陣(二維數組),將這些行均勻分開並將它們分散到子進程中。主人也在一大堆行上工作。我明白如何做到這一點,併成功地使用MPI_Send/MPI_Recv做到了這一點,但現在我正試圖用MPI_Bcast做到這一點,並且無法確定何時Bcast以及發送的內容。當我在不同點輸出完成的矩陣(C)時,似乎並非所有的行都被計算/更新,我知道這可能是因爲我沒有正確指定緩衝區。MPI_Bcast矩陣乘法設置

代碼:

#include <iostream> 
#include <stdlib.h> 
#include <mpi.h> 
#include <stdio.h> 
#include <time.h> 

using namespace std; 


int main(int argc, char *argv[]) 
{ 
    int myid, nproc; 
    int Ibuffer[200];   // Integer buffer, use proper size and type 
    double Dbuffer[2000];  // Double buffer, use proper size and type 
    char Sbuffer[200];   // String Buffer 
    int msg_len; 
    int i, j, k; 

    // initialize the MPI Environment and get the needed Data 
    MPI_Init(&argc, &argv); 
    MPI_Comm_size(MPI_COMM_WORLD, &nproc); 
    MPI_Comm_rank(MPI_COMM_WORLD, &myid); 

    // Get the name of processor 
    MPI_Get_processor_name(Sbuffer, &msg_len); 

    int RowA = 5, 
    ColA = 2, 
    RowB = ColA, 
    ColB = 3, 
    RowC = RowA, 
    ColC = ColB; 

    // Start clock 
    double start_time = MPI_Wtime(); 

    // Initialize matrices 
    double **matA = new double*[RowA]; 
    for (int i = 0; i < RowA; ++i) 
     matA[i] = new double[ColA]; 

    double **matB = new double*[RowB]; 
    for (int i = 0; i < RowB; ++i) 
     matB[i] = new double[ColB]; 

    double **matC = new double*[RowC]; 
    for (int i = 0; i < RowC; ++i) 
     matC[i] = new double[ColC]; 



    for (int i = 0; i < RowA; i++) // MatA 
    { 
     for (int j = 0; j < ColA; j++) 
     { 
      matA[i][j] = 2; 
     } 
    } 

    for (int i = 0; i < RowB; i++) // MatB 
    { 
     for (int j = 0; j < ColB; j++) 
     { 
      matB[i][j] = 2; 
     } 
    } 

    for (int i = 0; i < RowC; i++) // MatC 
    { 
     for (int j = 0; j < ColC; j++) 
     { 
      matC[i][j] = 0; 
     } 
    } 



    // All procs compute the chunk size, no need to send separate 
    int chunk = RowA/nproc; 
    int rest = RowA % nproc; 
    int my_start_row = myid * chunk;  // find my start row 
    int my_end_row = (myid + 1) * chunk;  // find my end row 

    // assign rest ot last worker 
    if (myid == nproc-1) my_end_row += rest; 

    int Dcount = ColA * chunk; // Data count for A to send to worker 
    MPI_Status status;  // Status variable neede for the receive 

    if (myid == 0) 
    {  
     // Send the rows needed for workers (Don't know if I need this or not) 
      //MPI_Bcast(matA, Dcount, MPI_DOUBLE, 0, MPI_COMM_WORLD); 

     // Then work on your own part 
     for (int i= my_start_row; i < my_end_row; i++) 
     { 
      for(int j=0; j < ColB; j++) 
      { 
       for(int k=0; k < RowB; k++) 
       { 
        matC[i][j] = matC[i][j] + (matA[i][k] * matB[k][j]); 
       } 
      } 
     } 

     for (int n=1; n<nproc; n++) 
     { 
      MPI_Bcast(matC, Dcount, MPI_DOUBLE, n, MPI_COMM_WORLD); 
      printf("\n ==++ Master Receive Result by Worker[%d], \n", n); 
     } 
    } 
    else 
    { 
     // This is worker, receive the needed info and start working 
     //MPI_Bcast(matA, Dcount, MPI_DOUBLE, 0, MPI_COMM_WORLD); 

     //printf("\n +++ Worker[%d], recived %d rows from Master \n", myid, myid*chunk); 
     cout << "\n === Master sent rows " << myid * chunk << " through " << (myid+1) * chunk << " to process #" << myid << endl; 

     // Do the work first 
     for (int i= my_start_row; i < my_end_row; i++) 
     { 
      for(int j=0; j < ColB; j++) 
      { 
       for(int k=0; k < RowB; k++) 
       { 
        matC[i][j] = matC[i][j] + (matA[i][k] * matB[k][j]); 
       } 
      } 
     } 

     // Send the result to the Master 
     MPI_Bcast(matC, Dcount, MPI_DOUBLE, myid, MPI_COMM_WORLD); 
     printf("\n --- Worker[%d], Sent Result to Master \n", myid); 

    } 

    // End clock 
    double end_time = MPI_Wtime(); 

    if (myid == 0) { 
     cout << "\nParallel Exec time: " << end_time - start_time << endl; 
    } 


    MPI_Finalize(); 



    // Clean up and release the storage 
    for (int i=0; i< RowA; i++) 
    { 
     delete [] matA[i]; 
     matA[i] = NULL; 
    } 
    delete [] matA; 
    matA = NULL; 
    for (int i=0; i< RowA; i++) 
    { 
     delete [] matC[i]; 
     matC[i] = NULL; 
    } 
    delete [] matC; 
    matC = NULL; 


} 

如果這個問題太模糊或麻煩我明白了,我只是想知道,如果我錯誤地理解如何以及何時使用BCAST。

回答

0

如果我在讀取時沒有犯錯,那麼這個代碼會在每個處理器的開始位置A,B和C處生成三個相同的矩陣,然後計算A乘以B,但僅限於某個索引。通過這種方式,該乘法用於處理器rank的結果是

C(rank) = A(begin;end) * B 

在所考慮的線路,和

C(rank) = 0 

外部。

所以問題出在MPI_Bcast沒有添加矩陣的事實,也沒有連接它,它是一個廣播函數,並將緩衝區(這裏是矩陣C)從根處理器發送到所有其他處理器。所以每個處理器,通過做Bcast,覆蓋以前的Bcast。

要連接緩衝區,要使用的功能是MPI_Gather。但是在這裏,由於矩陣在開始時尺寸很大,所以在這裏串接並不是好主意。

兩個選項:

  • 使用,做了加法運算和收集數據的功能。您可以看到該MPI_ReduceMPI_Allreduce(但將要進行的操作是x+(nbprocs-1)*0,所以它不是真的有用稱這樣的功能)
  • 拆分A和C子大小的矩陣,然後用MPI_Gather團聚結果。

希望它會幫助! 祝你好運