2012-11-24 21 views
4

我遇到了執行MPI非阻塞發送的問題,它導致計算機出現分段錯誤。所有機器都能正確接收數據,但在MPI_Waitall()操作期間,ID爲0的機器會崩潰。任何人都可以找出造成問題的原因嗎?謝謝!MPI_Isend分段錯誤

下面是程序的源代碼,並運行它時,我遇到了錯誤報告:

#include <stdio.h> 
#include <stdlib.h> 
#include <mpi.h> 

#define BLOCK_LOW(id,p,n) ((id)*(n)/(p)) 
#define BLOCK_HIGH(id,p,n) (BLOCK_LOW((id)+1,p,n)-1) 
#define BLOCK_SIZE(id,p,n) (BLOCK_HIGH(id,p,n)-BLOCK_LOW(id,p,n)+1) 
#define BLOCK_OWNER(id,p,n) (((p)*((id)+1)-1)/(n)) 

#define LENGTH 100 

int main(int argc, char *argv[]) { 
    int id, p, i; 
    MPI_Request* sendRequests; 
    MPI_Status* sendStatuses; 
    MPI_Request receiveRequest; 
    MPI_Status receiveStatus; 

    int array[LENGTH]; 
    int array2[LENGTH]; 

    MPI_Init(&argc, &argv); 
    MPI_Barrier(MPI_COMM_WORLD); 

    for (i = 0; i < LENGTH; i++) { 
     array[i] = i * 5; 
     array2[i] = 0; 
    } 


    MPI_Comm_rank(MPI_COMM_WORLD, &id); 
    MPI_Comm_size(MPI_COMM_WORLD, &p); 

    if (id == 0) { 
     sendRequests = malloc((p-1) * sizeof(MPI_Request)); 

     for (i = 1; i < p; i++) { 
      MPI_Isend(array + BLOCK_LOW(i-1, p-1, LENGTH), BLOCK_SIZE(i-1, p-1, LENGTH), MPI_INT, i, 0, MPI_COMM_WORLD, &sendRequests[i-1]); 
     } 

     MPI_Waitall(p-1, sendRequests, sendStatuses); 
    } else { 
     MPI_Recv(array2, BLOCK_SIZE(id-1, p-1, LENGTH), MPI_INT, 0, 0, MPI_COMM_WORLD, &receiveStatus); 

     for (i = 0; i < BLOCK_SIZE(id-1, p-1, LENGTH); i++) { 
      printf("Element %d (%d): %d\n", i, i + BLOCK_LOW(id-1, p-1, LENGTH), array2[i]); 
     } 
    } 

    MPI_Barrier(MPI_COMM_WORLD); 
    MPI_Finalize(); 
    return 0; 
} 

這是我的錯誤,當我運行代碼:

[lin12p5:13467] *** Process received signal *** 
[lin12p5:13467] Signal: Segmentation fault (11) 
[lin12p5:13467] Signal code: Invalid permissions (2) 
[lin12p5:13467] Failing at address: 0x400f30 
[lin12p5:13467] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fa96ab4eff0] 
[lin12p5:13467] [ 1] /usr/lib/libmpi.so.0(+0x37f01) [0x7fa96bad5f01] 
[lin12p5:13467] [ 2] /usr/lib/libmpi.so.0(PMPI_Waitall+0xb3) [0x7fa96bb06b73] 
[lin12p5:13467] [ 3] mpi-test(main+0x232) [0x400da6] 
[lin12p5:13467] [ 4] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fa96a7fcc8d] 
[lin12p5:13467] [ 5] mpi-test() [0x400ab9] 
[lin12p5:13467] *** End of error message *** 
-------------------------------------------------------------------------- 
mpirun noticed that process rank 0 with PID 13467 on node lab12p5 exited on signal 11  (Segmentation fault). 
-------------------------------------------------------------------------- 

[lin13p5][[33088,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) 

回答

4

您尚未爲sendStatuses分配任何空間。您需要malloc()一些空間,就像您爲sendRequests所做的那樣。完成後您還應該使用free()以防止內存泄漏。

+0

謝謝!我完全錯過了這一點,我正在瘋狂地試圖弄清楚什麼是錯的。這總是有點荒謬。 –