我想並行化一個for
循環操作,夾在兩個for
循環之間。在C++中使用MPI_Gather for 3d數組?
在每個處理器中計算出數據(3d陣列)之後,我想將每個處理器的數據收集回根節點以供我進一步處理。我嘗試使用MPI_Gather
函數來獲取數據回到根節點。使用此功能,數據將從根處理器收回,但數據不會從其他處理器收集。
int main(int argc, char * argv[]) {
int i,k,l,j;
int Np = 7, Nz = 7, Nr = 4;
int mynode, totalnodes;
MPI_Status status;
long double ***k_p, ***k_p1;
int startvalp,endvalp;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
// Allocation of memory
Allocate_3D_R(k_p,(Nz+1),(Np+1),(Nr+1));
Allocate_3D_R(k_p1,(Nz+1),(Np+1),(Nr+1));
// startvalp represents the local starting value for each processor
// endvalp represents the local ending value for each processor
startvalp = (Np+1)*mynode/totalnodes - 0;
endvalp = startvalp + (((Np+1)/totalnodes) -1);
for(l = 0 ; l <= 1 ; l++){
for(k=startvalp; k<=endvalp; k++){
// for loop parallelized between the processors
// original loop: for(k=0; k<= Np; k++)
for(i=0; i<=1; i++){
k_p[i][k][l] = l+k+i;
}
}
}
// For Np = 7 and for two processors ;
// k = 0 - 3 is calculated in processor 0;
// k = 4 - 7 is calculated in processor 1;
// Now I need to collect the value of k_p from processor 1
// back to the root processor.
// MPI_Gather function is used.
for(l = 0 ; l <= 1 ; l++){
for(k=startvalp; k<=endvalp; k++){
for(i=0; i<=1; i++){
MPI_Gather(&(k_p[i][k][l]),1, MPI_LONG_DOUBLE,&(k_p1[i][k][l]),1, MPI_LONG_DOUBLE, 0, MPI_COMM_WORLD);
}
}
}
// Using this the k_p is collected from root processor and stored
// in the k_p1 variable, but from the slave processor it is not
// collected back to the root processor.
if(mynode == 0){
for(l = 0 ; l <= 1 ; l++){
for(k=0; k<=Np; k++){
for(i=0i<=1;i++){
cout << "Processor "<<mynode;
cout << ": k_p["<<i<<"]["<<k<<"]["<<l<<"] = " <<k_p1[i][k][l]<<endl;
}
}
}
}
MPI_Finalize();
} // end of main
void Allocate_3D_R(long double***& m, int d1, int d2, int d3) {
m=new long double** [d1];
for (int i=0; i<d1; ++i) {
m[i]=new long double* [d2];
for (int j=0; j<d2; ++j) {
m[i][j]=new long double [d3];
for (int k=0; k<d3; ++k) {
m[i][j][k]=0.0;
}
}
}
}
這裏是輸出:
Processor 0: k_p[0][0][0] = 0
Processor 0: k_p[1][0][0] = 1
Processor 0: k_p[0][1][0] = 1
Processor 0: k_p[1][1][0] = 2
Processor 0: k_p[0][2][0] = 2
Processor 0: k_p[1][2][0] = 3
Processor 0: k_p[0][3][0] = 3
Processor 0: k_p[1][3][0] = 4
Processor 0: k_p[0][4][0] = 0
Processor 0: k_p[1][4][0] = 0
Processor 0: k_p[0][5][0] = 0
Processor 0: k_p[1][5][0] = 0
Processor 0: k_p[0][6][0] = 0
Processor 0: k_p[1][6][0] = 0
Processor 0: k_p[0][7][0] = 0
Processor 0: k_p[1][7][0] = 0
Processor 0: k_p[0][0][1] = 1
Processor 0: k_p[1][0][1] = 2
Processor 0: k_p[0][1][1] = 2
Processor 0: k_p[1][1][1] = 3
Processor 0: k_p[0][2][1] = 3
Processor 0: k_p[1][2][1] = 4
Processor 0: k_p[0][3][1] = 4
Processor 0: k_p[1][3][1] = 5
Processor 0: k_p[0][4][1] = 0
Processor 0: k_p[1][4][1] = 0
Processor 0: k_p[0][5][1] = 0
Processor 0: k_p[1][5][1] = 0
Processor 0: k_p[0][6][1] = 0
Processor 0: k_p[1][6][1] = 0
Processor 0: k_p[0][7][1] = 0
Processor 0: k_p[1][7][1] = 0
從根處理器的數據傳送,但不能從另一個處理器。 我試過使用MPI_Send
和MPI_Recv
函數,並沒有遇到上述問題,但對於for
循環的大值需要更多時間。
因此,任何人都可以提供解決上述問題?
也許root本身會向自己發送零。嘗試僅在根目錄時接收 – 2011-04-09 17:22:19