錯誤的等級ID使用MPI_Reduce

嗯，我正在做一些使用MPI + C的作業。事實上，我剛剛編寫了一個由Peter Pacheco的書中的一個小型程序設計代碼3.2，名爲「並行編程入門」。該代碼似乎適用於3或5個進程...但是當我嘗試超過6個進程時，程序中斷。錯誤的等級ID使用MPI_Reduce

我正在使用一種非常「糟糕的」調試方法，即將一些printfs追蹤出現問題的地方。使用這種「方法」，我發現在MPI_Reduce之後，會出現一些奇怪的行爲，並且我的程序會對行列ID感到困惑，特別是排名0消失，並且出現一個非常大（錯誤）的排名。

我的代碼的下方，之後，我張貼3點9的過程輸出......我與

mpiexec -n X ./name_of_program

其中X是進程的數目運行。

我的代碼：現在

#include <stdio.h> 
#include <stdlib.h> 
#include <mpi.h> 

int main(void) 
{ 
MPI_Init(NULL,NULL); 

long long int local_toss=0, local_num_tosses=-1, local_tosses_in_circle=0, global_tosses_in_circle=0; 

double local_x=0.0,local_y=0.0,pi_estimate=0.0; 

int comm_sz, my_rank; 

MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); 
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);   

if (my_rank == 0) { 
    printf("\nEnter the number of dart tosses: "); 
    fflush(stdout); 
    scanf("%lld",&local_num_tosses); 
    fflush(stdout); 
} 

// 
MPI_Barrier(MPI_COMM_WORLD); 

    MPI_Bcast(&local_num_tosses, 1, MPI_LONG_LONG_INT, 0, MPI_COMM_WORLD); 

    MPI_Barrier(MPI_COMM_WORLD); 

    srand(rand()); //tried to improve randomness here! 

for (local_toss=0;local_toss<local_num_tosses;local_toss++) { 
    local_x = (-1) + (double)rand()/(RAND_MAX/2); 
    local_y = (-1) + (double)rand()/(RAND_MAX/2); 

    if ((local_x*local_x + local_y*local_y) <= 1) {local_tosses_in_circle++;} 
} 


MPI_Barrier(MPI_COMM_WORLD); 

MPI_Reduce 
(
    &local_tosses_in_circle, 
    &global_tosses_in_circle, 
    comm_sz, 
    MPI_LONG_LONG_INT, 
    MPI_SUM, 
    0, 
    MPI_COMM_WORLD 
); 

printf("\n\nDEBUG: myrank = %d, comm_size = %d",my_rank,comm_sz); 
fflush(stdout); 

    MPI_Barrier(MPI_COMM_WORLD); 

if (my_rank == 0) { 
    pi_estimate = ((double)(4*global_tosses_in_circle))/((double) comm_sz*local_num_tosses); 
    printf("\nPi estimate = %1.5lf \n",pi_estimate); 
    fflush(stdout); 
} 

MPI_Finalize(); 
    return 0; 
}

，2個輸出：

（ⅰ）對於3個工序：

Enter the number of dart tosses: 1000000 

DEBUG: myrank = 0, comm_size = 3 

DEBUG: myrank = 1, comm_size = 3 

DEBUG: myrank = 2, comm_size = 3 
Pi estimate = 3.14296

（ⅱ）對於圖9點的過程：（請注意，成\ n輸出是奇怪的，有時它不起作用）

 Enter the number of dart tosses: 10000000 


     DEBUG: myrank = 1, comm_size = 9 
     DEBUG: myrank = 7, comm_size = 9 


     DEBUG: myrank = 3, comm_size = 9 
     DEBUG: myrank = 2, comm_size = 9DEBUG: myrank = 5, comm_size = 9 
     DEBUG: myrank = 8, comm_size = 9 



     DEBUG: myrank = 6, comm_size = 9 

     DEBUG: myrank = 4, comm_size = 9DEBUG: myrank = -3532887, comm_size = 141598939[PC:06511] *** Process received signal *** 
     [PC:06511] Signal: Segmentation fault (11) 
     [PC:06511] Signal code: (128) 
     [PC:06511] Failing at address: (nil) 
     -------------------------------------------------------------------------- 
     mpiexec noticed that process rank 0 with PID 6511 on node PC exited on signal 11 (Segmentation fault). 
     --------------------------------------------------------------------------

來源

2013-04-08 guipy

這對我的作品的時候MPI_Reduce第三個參數是1，不comm_size（因爲元素的每個緩衝器數爲1）：

MPI_Reduce 
(
    &local_tosses_in_circle, 
    &global_tosses_in_circle, 
    1, //instead of comm_size 
    MPI_LONG_LONG_INT, 
    MPI_SUM, 
    0, 
    MPI_COMM_WORLD 
);

當你增加的進程數，MPI_Reduce將覆蓋其他的東西函數堆棧，例如my_rank和comm_sz，並破壞數據。

此外，我不認爲你需要任何MPI_Barrier陳述。無論如何，MPI_Reduce和MPI_Bcast都會阻止。

我不會擔心換行符。它們不會丟失，但在輸出的其他位置，可能是因爲許多進程同時寫入stdout。

順便說一下：使用printf的調試非常普遍。

來源

2013-04-08 17:18:38

非常好的拉斐爾，非常感謝！它解決了我的問題！實際上，MPI_Reduce的第三個參數是數據的大小..如果它不是一個向量，正確的值是1 ...真的！我感覺很愚蠢:-(笑lol 關於障礙，我想你一樣......但看看[這個]（http://stackoverflow.com/questions/9284419/is-mpi-reduce -blocking-or-a-natural-barrier）Stack Overflow post ..它真的讓人困惑！ – guipy 2013-04-08 17:58:15

我看到了啤酒發佈的地方，我可能會進入並回答你的問題，如果你沒有執行MPI_Reduce一個循環，就像啤酒提到的那樣，放下MPI_Brier一定是好的。 – 2013-04-08 21:39:42

非常感謝拉斐爾，爲了幫助我的努力！非常感謝！ – guipy 2013-04-08 22:49:06

錯誤的等級ID使用MPI_Reduce

回答

相關問題