MPI_Gather（）將中心元素合成爲全局矩陣

這是來自MPI_Gather 2D array的後續問題。這裏的情況是：MPI_Gather（）將中心元素合成爲全局矩陣

id = 0 has this submatrix 

|16.000000| |11.000000| |12.000000| |15.000000| 
|6.000000| |1.000000| |2.000000| |5.000000| 
|8.000000| |3.000000| |4.000000| |7.000000| 
|14.000000| |9.000000| |10.000000| |13.000000| 
----------------------- 

id = 1 has this submatrix 

|12.000000| |15.000000| |16.000000| |11.000000| 
|2.000000| |5.000000| |6.000000| |1.000000| 
|4.000000| |7.000000| |8.000000| |3.000000| 
|10.000000| |13.000000| |14.000000| |9.000000| 
----------------------- 

id = 2 has this submatrix 

|8.000000| |3.000000| |4.000000| |7.000000| 
|14.000000| |9.000000| |10.000000| |13.000000| 
|16.000000| |11.000000| |12.000000| |15.000000| 
|6.000000| |1.000000| |2.000000| |5.000000| 
----------------------- 

id = 3 has this submatrix 

|4.000000| |7.000000| |8.000000| |3.000000| 
|10.000000| |13.000000| |14.000000| |9.000000| 
|12.000000| |15.000000| |16.000000| |11.000000| 
|2.000000| |5.000000| |6.000000| |1.000000| 
----------------------- 

The global matrix: 

|1.000000| |2.000000| |5.000000| |6.000000| 
|3.000000| |4.000000| |7.000000| |8.000000| 
|11.000000| |12.000000| |15.000000| |16.000000| 
|-3.000000| |-3.000000| |-3.000000| |-3.000000|

我所試圖做的是僅僅收集在全球電網的核心要素（那些沒有在邊界），因此全球電網應該是這樣的：

|1.000000| |2.000000| |5.000000| |6.000000| 
|3.000000| |4.000000| |7.000000| |8.000000| 
|9.000000| |10.000000| |13.000000| |14.000000| 
|11.000000| |12.000000| |15.000000| |16.000000|

而不是我喜歡的那個。這是我的代碼：

float **gridPtr; 
float **global_grid; 
lengthSubN = N/pSqrt; // N is the dim of global gird and pSqrt the sqrt of the number of processes 
MPI_Type_contiguous(lengthSubN, MPI_FLOAT, &rowType); 
MPI_Type_commit(&rowType); 
if(id == 0) { 
    MPI_Gather(&gridPtr[1][1], 1, rowType, global_grid[0], 1, rowType, 0, MPI_COMM_WORLD); 
    MPI_Gather(&gridPtr[2][1], 1, rowType, global_grid[1], 1, rowType, 0, MPI_COMM_WORLD); 
} else { 
    MPI_Gather(&gridPtr[1][1], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD); 
    MPI_Gather(&gridPtr[2][1], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD); 
} 
... 
float** allocate2D(float** A, const int N, const int M) { 
    int i; 
    float *t0; 

    A = malloc(M * sizeof (float*)); /* Allocating pointers */ 
    if(A == NULL) 
     printf("MALLOC FAILED in A\n"); 
    t0 = malloc(N * M * sizeof (float)); /* Allocating data */ 
    if(t0 == NULL) 
     printf("MALLOC FAILED in t0\n"); 
    for (i = 0; i < M; i++) 
     A[i] = t0 + i * (N); 

    return A; 
}

編輯：

這裏是我的嘗試沒有MPI_Gather()，但子陣：

MPI_Datatype mysubarray; 

    int starts[2] = {1, 1}; 
    int subsizes[2] = {lengthSubN, lengthSubN}; 
    int bigsizes[2] = {N_glob, M_glob}; 
    MPI_Type_create_subarray(2, bigsizes, subsizes, starts, 
          MPI_ORDER_C, MPI_FLOAT, &mysubarray); 
    MPI_Type_commit(&mysubarray); 
    MPI_Isend(&(gridPtr[0][0]), 1, mysubarray, 0, 3, MPI_COMM_WORLD, &req[0]); 
    MPI_Type_free(&mysubarray); 
    MPI_Barrier(MPI_COMM_WORLD); 
    if(id == 0) { 
     for(i = 0; i < p; ++i) { 
     MPI_Irecv(&(global_grid[i][0]), lengthSubN * lengthSubN, MPI_FLOAT, i, 3, MPI_COMM_WORLD, &req[0]); 
     } 
    } 
    if(id == 0) 
      print(global_grid, N_glob, N_glob);

但結果是：

|1.000000| |2.000000| |3.000000| |4.000000| 
|5.000000| |6.000000| |7.000000| |8.000000| 
|9.000000| |10.000000| |11.000000| |12.000000| 
|13.000000| |14.000000| |15.000000| |16.000000|

這是不是前實際上我想要的。我必須找到一種方法來說明，它應該以另一種方式放置數據。所以，如果我做的：

MPI_Irecv(&(global_grid[0][0]), 1, mysubarray, 0, 3, MPI_COMM_WORLD, &req[0]);

然後我會得到：

|-3.000000| |-3.000000| |-3.000000| |-3.000000| 
|-3.000000| |1.000000| |2.000000| |-3.000000| 
|-3.000000| |3.000000| |4.000000| |-3.000000| 
|-3.000000| |-3.000000| |-3.000000| |-3.000000|

來源

2015-12-31 gsamaras

如果您沒有堅持使用指針數組來表示2D數組，那麼這將是一個簡單的練習。爲什麼不使用高音線性內存呢？ – talonmies

@talonmies我想增加一個在2012年寫回的代碼，所以它不是我的選擇，真的。但是，我可以在收集操作之前將二維陣列弄平，這不會花費我想象的那麼多。所以如果你對這種方法有一個主張，請隨時發表一個答案。 – gsamaras

例如@talonmies我可以讓每個進程創建一個只包含中心元素的1D數組，例如第4個進程會有'{13,14,15,16}'。但是，我仍不清楚如何繼續。 – gsamaras

我不能給出一個完整的解決方案，但我會解釋爲什麼預期使用MPI_Gather您最初的例子不工作。

隨着lengthSubN=2你定義的2漂浮在一個新的數據類型，它們在這條線存儲相鄰內存：

MPI_Type_contiguous(lengthSubN, MPI_FLOAT, &rowType);

現在，讓我們來看看第一MPI_Gather電話是：

if(id == 0) { 
    MPI_Gather(&gridPtr[1][1], 1, rowType, global_grid[0], 1, rowType, 0, MPI_COMM_WORLD); 
} else { 
    MPI_Gather(&gridPtr[1][1], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD); 
}

需要1 rowType這是從每個等級的元素gridPtr[1][1]開始的2個相鄰的浮點數。這些都是值：

id 0: 1.0 2.0 
id 1: 5.0 6.0 
id 2: 9.0 10.0 
id 3: 13.0 14.0

，並把它們相鄰的接收緩衝區global_grid[0]指向。該指針實際上指向的第一行開始，這樣的內存充滿：

1.0 2.0 5.0 6.0 9.0 10.0 13.0 14.0

但是，global_grid有每行只有4列，因此，最後4值換到第二行指出，通過global_grid[1]（*）。這甚至可能是未定義的行爲。因此，這MPI_Gather後global_grid內容是：

1.0 2.0 5.0 6.0 
9.0 10.0 13.0 14.0 
-3.0 -3.0 -3.0 -3.0 
-3.0 -3.0 -3.0 -3.0

第二MPI_Gather工作方式相同，並在global_grid第二行開始寫：

3.0 4.0 7.0 8.0 11.0 12.0 15.0 16.0

因此，它覆蓋上面和一些值結果作爲觀察：

1.0 2.0 5.0 6.0 
3.0 4.0 7.0 8.0 
11.0 12.0 15.0 16.0 
-3.0 -3.0 -3.0 -3.0

（*）allocate2d實際上所有爲二維數據緩衝區提供連續內存。

來源

2015-12-31 14:57:09

MPI_Gather（）將中心元素合成爲全局矩陣

回答

相關問題