1
MHM,你好,大家都是在運行並行程序在室內用Linux的MPI和OpenMP的時候得到這些錯誤,MPI_Comm_size分段故障
[node65:03788] *** Process received signal ***
[node65:03788] Signal: Segmentation fault (11)
[node65:03788] Signal code: Address not mapped (1)
[node65:03788] Failing at address: 0x44000098
[node65:03788] [ 0] /lib64/libpthread.so.0 [0x2b663e446c00]
[node65:03788] [ 1] /public/share/mpi/openmpi- 1.4.5//lib/libmpi.so.0(MPI_Comm_size+0x60) [0x2b663d694360]
[node65:03788] [ 2] fdtd_3D_xyzPML_MPI_OpenMP(main+0xaa) [0x42479a]
[node65:03788] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b663e56f184]
[node65:03788] [ 4] fdtd_3D_xyzPML_MPI_OpenMP(_ZNSt8ios_base4InitD1Ev+0x39) [0x405d79]
[node65:03788] *** End of error message ***
-----------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 3787 on node node65 exited on signal 11 (Segmentation fault).
-----------------------------------------------------------------------------
後,我分析的核心文件,我得到以下信息:
[Thread debugging using libthread_db enabled]
[New Thread 47310344057648 (LWP 26962)]
[New Thread 1075841344 (LWP 26966)]
[New Thread 1077942592 (LWP 26967)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47310344057648 (LWP 26962)]
0x00002b074afb3360 in PMPI_Comm_size() from /public/share/mpi/openmpi-1.4.5//lib/libmpi.so.0
這是什麼原因造成的?感謝您的幫助
代碼(TEST.CPP)如下,你可以試試:
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include "mpi.h"
int main(int argc, char* argv[])
{
int nprocs = 1; //the number of processes
int myrank = 0;
int provide;
MPI_Init_thread(&argc,&argv,MPI_THREAD_FUNNELED,&provide);
if (MPI_THREAD_FUNNELED != provide)
{
printf ("%d != required %d", MPI_THREAD_FUNNELED, provide);
return 0;
}
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
int num_threads = 1; //Openmp
omp_set_dynamic(1);
num_threads = 16;
omp_set_num_threads(num_threads);
#pragma omp parallel
{
printf ("%d omp thread from %d mpi process\n", omp_get_thread_num(), myrank);
}
MPI_Finalize();
}
你能告訴我們導致段錯誤的代碼嗎?您可能需要使用調試enablead編譯您的程序,並在像mpirun -np 2 xterm -e gdb -ex run parallel_program'這樣的調試器中運行它。 –
感謝您的幫助,我不知道哪個代碼會導致這種情況,我認爲它與MPI_Comm_size有關。對於源代碼太長而無法向您顯示,我感到抱歉。另外,代碼可以在Windows中運行。在將run_environment更改爲Linux之後,我使用makefile來編譯我的代碼,並且在運行「mpirun -np 8 parallel_program」時沒有任何錯誤,它具有上述錯誤。 – kenan
向我們展示'main'函數的相關部分,包括初始化MPI的方式以及如何調用'MPI_Comm_size'。 –