我在MPI程序中遇到了一個奇怪的問題。部分代碼應該僅由根(進程0)執行,但進程0似乎執行兩次。例如,MPI:進程0兩次執行其代碼
root = 0;
if (rank == root) {
cout << "Hello from process " << rank << endl;
}
給
你好從進程0
你好從進程0
這似乎當我使用16個或多個進程纔會發生。我一直試圖調試這幾天,但不能。
由於我不知道爲什麼會發生這種情況,我想我必須在這裏複製我的整個代碼。我說得很好,很清楚。目標是乘以兩個矩陣(簡化假設)。問題發生在最後的if
塊中。
#include <iostream>
#include <cstdlib>
#include <cmath>
#include "mpi.h"
using namespace std;
int main(int argc, char *argv[]) {
if (argc != 2) {
cout << "Use one argument to specify the N of the matrices." << endl;
return -1;
}
int N = atoi(argv[1]);
int A[N][N], B[N][N], res[N][N];
int i, j, k, start, end, P, p, rank;
int root=0;
MPI::Status status;
MPI::Init(argc, argv);
rank = MPI::COMM_WORLD.Get_rank();
P = MPI::COMM_WORLD.Get_size();
p = sqrt(P);
/* Designate the start and end position for each process. */
start = rank * N/p;
end = (rank+1) * N/p;
if (rank == root) { // No problem here
/* Initialize matrices. */
for (i=0; i<N; i++)
for (j=0; j<N; j++) {
A[i][j] = N*i + j;
B[i][j] = N*i + j;
}
cout << endl << "Matrix A: " << endl;
for(i=0; i<N; ++i)
for(j=0; j<N; ++j) {
cout << " " << A[i][j];
if(j==N-1)
cout << endl;
}
cout << endl << "Matrix B: " << endl;
for(i=0; i<N; ++i)
for(j=0; j<N; ++j) {
cout << " " << B[i][j];
if(j==N-1)
cout << endl;
}
}
/* Broadcast B to all processes. */
MPI::COMM_WORLD.Bcast(B, N*N, MPI::INT, 0);
/* Scatter A to all processes. */
MPI::COMM_WORLD.Scatter(A, N*N/p, MPI::INT, A[start], N*N/p, MPI::INT, 0);
/* Compute your portion of the final result. */
for(i=start; i<end; i++)
for(j=0; j<N; j++) {
res[i][j] = 0;
for(k=0; k<N; k++)
res[i][j] += A[i][k]*B[k][j];
}
MPI::COMM_WORLD.Barrier();
/* Gather results form all processes. */
MPI::COMM_WORLD.Gather(res[start], N*N/p, MPI::INT, res, N*N/p, MPI::INT, 0);
if (rank == root) { // HERE is the problem!
// This chunk executes twice in process 0
cout << endl << "Result of A x B: " << endl;
for(i=0; i<N; ++i)
for(j=0; j<N; ++j) {
cout << " " << res[i][j];
if(j == N-1)
cout << endl;
}
}
MPI::Finalize();
return 0;
}
當運行具有P = 16和兩個4×4矩陣中的程序:
>$ mpirun -np 16 ./myprog 4
Matrix A:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Matrix B:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Result of A x B:
6366632 0 0 0
-12032 32767 0 0
0 0 -1431597088 10922
1 10922 0 0
Result of A x B:
56 62 68 74
152 174 196 218
248 286 324 362
344 398 452 506
爲什麼打印出該第一結果? 如果有人願意幫助我,我將不勝感激。
有了這麼小的n,'N * N/p'將評估爲0.這似乎是一個問題。你試過N> 16,P = 16嗎? – NoseKnowsAll
這似乎給我一個分段錯誤。我不認爲'N * N/p'正在評估爲零;添加打印語句顯示它是4,P = 16和N = 4。注意'p = sqrt(P)'。 – Novice