2017-01-30 100 views
0

我正在運行一個代碼,我簡單地創建了2個矩陣:一個矩陣的尺寸爲args x nsame,另一個矩陣的尺寸爲nsame x bcols。結果是一個尺寸爲x bcols的數組。這是相當簡單的使用BLAS來實現,並出現下面的代碼如期運行使用具有的openmpi下面的主從模式時:`當尺寸變大時會出現矩陣計算錯誤

#include <iostream> 
#include <stdio.h> 
#include <iostream> 
#include <cmath> 
#include <mpi.h> 
#include <gsl/gsl_blas.h> 
using namespace std;` 

int main(int argc, char** argv){ 
    int noprocs, nid; 
    MPI_Status status; 
    MPI_Init(&argc, &argv); 
    MPI_Comm_rank(MPI_COMM_WORLD, &nid); 
    MPI_Comm_size(MPI_COMM_WORLD, &noprocs); 
    int master = 0; 

    const int nsame = 500; //must be same if matrices multiplied together = acols = brows 
    const int arows = 500; 
    const int bcols = 527; //works for 500 x 500 x 527 and 6000 x 100 x 36 
    int rowsent; 
    double buff[nsame]; 
    double b[nsame*bcols]; 
    double c[arows][bcols]; 
    double CC[1*bcols]; //here ncols corresponds to numbers of rows for matrix b 
    for (int i = 0; i < bcols; i++){ 
       CC[i] = 0.; 
    }; 
    // Master part 
    if (nid == master) { 

     double a [arows][nsame]; //creating identity matrix of dimensions arows x nsame (it is I if arows = nsame) 
     for (int i = 0; i < arows; i++){ 
      for (int j = 0; j < nsame; j++){ 
       if (i == j) 
        a[i][j] = 1.; 
       else 
        a[i][j] = 0.; 
      } 
     } 
     double b[nsame*bcols];//here ncols corresponds to numbers of rows for matrix b 
      for (int i = 0; i < (nsame*bcols); i++){ 
       b[i] = (10.*i + 3.)/(3.*i - 2.) ; 
      }; 
     MPI_Bcast(b,nsame*bcols, MPI_DOUBLE_PRECISION, master, MPI_COMM_WORLD); 
     rowsent=0; 
     for (int i=1; i < (noprocs); i++) { 
      // Note A is a 2D array so A[rowsent]=&A[rowsent][0] 
      MPI_Send(a[rowsent], nsame, MPI_DOUBLE_PRECISION,i,rowsent+1,MPI_COMM_WORLD); 
      rowsent++; 
     } 

     for (int i=0; i<arows; i++) { 
      MPI_Recv(CC, bcols, MPI_DOUBLE_PRECISION, MPI_ANY_SOURCE, MPI_ANY_TAG, 
        MPI_COMM_WORLD, &status); 
      int sender = status.MPI_SOURCE; 
      int anstype = status.MPI_TAG;   //row number+1 
      int IND_I = 0; 
      while (IND_I < bcols){ 
       c[anstype - 1][IND_I] = CC[IND_I]; 
       IND_I++; 
      } 
      if (rowsent < arows) { 
       MPI_Send(a[rowsent], nsame,MPI_DOUBLE_PRECISION,sender,rowsent+1,MPI_COMM_WORLD); 
       rowsent++; 
      } 
      else {  // tell sender no more work to do via a 0 TAG 
       MPI_Send(MPI_BOTTOM,0,MPI_DOUBLE_PRECISION,sender,0,MPI_COMM_WORLD); 
      } 
     } 
    } 

    // Slave part 
    else { 
     MPI_Bcast(b,nsame*bcols, MPI_DOUBLE_PRECISION, master, MPI_COMM_WORLD); 
     MPI_Recv(buff,nsame,MPI_DOUBLE_PRECISION,master,MPI_ANY_TAG,MPI_COMM_WORLD,&status); 
     while(status.MPI_TAG != 0) { 
      int crow = status.MPI_TAG; 
      gsl_matrix_view AAAA = gsl_matrix_view_array(buff, 1, nsame); 
      gsl_matrix_view BBBB = gsl_matrix_view_array(b, nsame, bcols); 
      gsl_matrix_view CCCC = gsl_matrix_view_array(CC, 1, bcols); 

      /* Compute C = A B */ 
      gsl_blas_dgemm (CblasNoTrans, CblasNoTrans, 1.0, &AAAA.matrix, &BBBB.matrix, 
          0.0, &CCCC.matrix); 

      MPI_Send(CC,bcols,MPI_DOUBLE_PRECISION, master, crow, MPI_COMM_WORLD); 
      MPI_Recv(buff,nsame,MPI_DOUBLE_PRECISION,master,MPI_ANY_TAG,MPI_COMM_WORLD,&status); 
     } 
    } 

    // output c here on master node //uncomment the below lines if I wish to see the output 
    // if (nid == master){ 
//  if (rowsent == arows){ 
//   //   cout << rowsent; 
//   int IND_F = 0; 
//   while (IND_F < arows){ 
//    int IND_K = 0; 
//    while (IND_K < bcols){ 
//     cout << "[" << IND_F << "]" << "[" << IND_K << "] = " << c[IND_F][IND_K] << " "; 
//     IND_K++; 
//    } 
//    cout << "\n"; 
//    IND_F++; 
//   } 
//  } 
// } 
    MPI_Finalize(); 
    //free any allocated space here 
    return 0; 
}; 

現在看似奇怪的是,當我增加矩陣的大小(例如從nsame = 500到nsame = 501),代碼不再起作用。我收到以下錯誤:

mpirun noticed that process rank 0 with PID 0 on node Users-MacBook-Air exited on signal 11 (Segmentation fault: 11). 

我與尺寸的矩陣的其它組合嘗試這樣做,有總是似乎是矩陣本身的尺寸(這似乎的上限變化基於如何我改變不同的尺寸本身)。我也嘗試修改矩陣的值,雖然這看起來沒有改變任何東西。我知道在我的例子中有一些替代方法來初始化矩陣(例如使用向量),但是我只是想知道爲什麼我現在的任意大小矩陣乘法似乎只能在某種程度上起作用。

回答

2

你聲明太多大的本地變量,這是導致堆棧空間相關的問題。特別是a是500x500雙倍(250000個8字節元素或200萬字節)。 b甚至更​​大。

您需要爲部分或全部這些陣列動態分配空間。

可能有編譯器選項來增加初始堆棧空間,但這不是一個好的長期解決方案。