2012-09-22 65 views
2

嗨,我努力學習的openmpi在C.我有一點做矩陣乘法與這個節目時,我這樣做,結果是錯的麻煩。該程序編譯,但我覺得我的矩陣乘法算法在某個地方是錯誤的。的openmpi矩陣乘法

我對解決這個問題的方法是使用MPI_Scatter散射矩陣A,然後轉置矩陣B.然後MPI_Scatter矩陣B.一旦它們分散我做矩陣乘法計算和收集的結果返回給根進程。我不確定我是否錯過了一些東西,但我還沒有完全理解Scatter和Gather。我知道與發送你可以發送到個人進程和Recv從不同的進程,但它如何與分散和收集工作。如果我在此代碼的某個地方犯了錯,請告訴我。謝謝。

我的源代碼:

#define N 512 
#include <stdio.h> 
#include <math.h> 
#include <mpi.h> 
#include <sys/time.h> 
print_results(char *prompt, float a[N][N]); 
int main(int argc, char *argv[]) { 
    int size, rank, blksz, i, j, k; 
    float a[N][N], b[N][N], c[N][N]; 
    char *usage = "Usage: %s file\n"; 
    float row[N][N], col[N][N]; 
    FILE *fd; 
    int portion, lowerbound, upperbound; 
    double elapsed_time, start_time, end_time; 
    struct timeval tv1, tv2; 

    MPI_Init(&argc, &argv); 
    MPI_Comm_rank(MPI_COMM_WORLD, &rank); 
    MPI_Comm_size(MPI_COMM_WORLD, &size); 
    blksz = (int) ceil((double) N/size); 
    /* 
    if (argc < 2) { 
     fprintf (stderr, usage, argv[0]); 
     return -1; 
    } 
    if ((fd = fopen(argv[1], "r")) == NULL) { 
     fprintf(stderr, "%s: Cannot open file %s for reading.\n", argv[0],argv[1]); 
     fprintf(stderr, usage, argv[0]); 
     return -1; 
} 
*/ 

//Read input from file for matrices a and b. 
//The I/O is not timed because this I/O needs 
//to be done regardless of whether this program 
//is run sequentially on one processor or in 
//parallel on many processors. Therefore, it is 
//irrelevant when considering speedup. 
if (rank == 0) { 
    for (i = 0; i < N; i++) 
     for (j = 0; j < N; j++) 
      a[i][j] = i + j; 
    for (i = 0; i < N; i++) 
     for (j = 0; j < N; j++) 
      b[i][j] = i + j; 
    /* 
    for (i = 0; i < N; i++) { 
     for (j = i + 1; j < N; j++) { 
      int temp = b[i][j]; 
      b[i][j] = b[j][i]; 
      b[j][i] = temp; 
     } 
    } 
    */ 
} 

//TODO: Add a barrier prior to the time stamp. 
MPI_Barrier(MPI_COMM_WORLD); 
// Take a time stamp 
gettimeofday(&tv1, NULL); 
//TODO: Scatter the input matrices a and b. 
    MPI_Scatter(a, blksz * N, MPI_FLOAT, row, blksz * N, MPI_FLOAT, 0, 
     MPI_COMM_WORLD); 
    MPI_Scatter(b, blksz * N, MPI_FLOAT, col, blksz * N, MPI_FLOAT, 0, 
     MPI_COMM_WORLD); 
//TODO: Add code to implement matrix multiplication (C=AxB) in parallel. 
for (i = 0; i < blksz && rank * blksz + i < N; i++) { 
    for (j = 0; j < N; j++) { 
     c[i][j] = 0.0; 
     for (k = 0; k < N; k++) { 
      c[i][j] += row[i][j] * col[j][k]; 
     } 
    } 
} 
//TODO: Gather partial result back to the master process. 
MPI_Gather(c, blksz * N, MPI_FLOAT, c, blksz * N, MPI_FLOAT, 0, 
     MPI_COMM_WORLD); 
// Take a time stamp. This won't happen until after the master 
// process has gathered all the input from the other processes. 
gettimeofday(&tv2, NULL); 
elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) 
     /1000000.0); 
printf("elapsed_time=\t%lf (seconds)\n", elapsed_time); 
// print results 
MPI_Barrier(MPI_COMM_WORLD); 
print_results("C = ", c); 
MPI_Finalize(); 

} 

print_results(char *prompt, float a[N][N]) { 
int i, j; 
printf("\n\n%s\n", prompt); 
for (i = 0; i < N; i++) { 
    for (j = 0; j < N; j++) { 
     printf(" %.2f", a[i][j]); 
    } 
    printf("\n"); 
} 
printf("\n\n"); 
} 

回答

2

您的計算內核是錯誤的。作爲b是所謂換位和c I,Jai行簡單的點積和bj,最裏面的循環應爲:

for (k = 0; k < N; k++) { 
    c[i][j] += row[i][k] * col[j][k]; // row[i][k] and not row[i][j] 
} 

除了你的矩陣是float但在(註釋掉)換位碼temp變量爲int。它可能適用於這種特殊情況,因爲您用整數初始化了ab的元素,但在一般情況下不起作用。

否則,分散/集中部分看起來不錯。請注意,如果N不能被MPI進程的數量整除,您的代碼將無法工作。要處理這些情況,您可能需要使用MPI_ScattervMPI_Gatherv進行調查。

0

希望你正在嘗試做一個矩陣乘法。不需要轉置矩陣。

不能散射矩陣B。因爲矩陣a中的每一行都需要整個b矩陣。廣播b矩陣是正確的。

MPI_Scatter(a, blksz * N, MPI_FLOAT, row, blksz * N, MPI_FLOAT, 0,MPI_COMM_WORLD); 
MPI_Bcast(b, N * N, MPI_FLOAT, 0,MPI_COMM_WORLD); 

也正如@Hristo lliev所提到的,你的乘法碼需要改變。

for (i = 0; i < blksz && rank * blksz + i < N; i++) { 
    for (j = 0; j < N; j++) { 
     product[i][j] = 0.0; 
     for (k = 0; k < N; k++) { 
      product[i][j] = product[i][j]+ row[i][k] * b[k][j]; 
     } 
    } 
} 

此實現正確的數組聲明是

float row[blksz][N] , product[blksz][N] 

使用聚集product陣列從所有節點的根節點結合起來。

MPI_Gather(product, blksz * N, MPI_FLOAT, c, blksz * N, MPI_FLOAT, 0,MPI_COMM_WORLD); 

,你需要使用MPI_ScattervMPI_Gatherv