所以我有一個我正在編寫的raytracer編譯就好了,但是當我到達MPI_Gather()函數時,我得到了這個錯誤集。如果我寫入文件,整個事情都會很好,但是我不能在分佈式計算系統上運行它。pmpi_gather致命錯誤
Fatal error in PMPI_Gather: Internal MPI error!, error stack:
PMPI_Gather(856)......:
MPI_Gather(sbuf=0x8e05468, scount=882000, MPI_BYTE, rbuf=0x8df7628, rcount=882000, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Gather_impl(681).:
MPIR_Gather(641)......:
MPIR_Gather_intra(152):
MPIR_Localcopy(378)...:
memcpy arguments alias each other, dst=0x8df7628 src=0x8e05468 len=882000
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
我不完全確定這個錯誤是什麼意思,所以很難繞過它。
下面是主要功能源:
int main(int argc, char **argv) {
clock_t total_time = clock(), otime;
init_MPI(argc, argv); //Initialize OpenMPI
glutInit(&argc,argv);
get_params(argc, argv); //Get parameters from command line
if (buildScene(scene, cam) == -1) MPI_Abort(MPI_COMM_WORLD,rc); exit(1);
samples = samples > 0 ? whitted ? 1 : samples : 1;
if (numprocs == 1) {
scn = new RGBApixmap(h,w);
raytrace(h,scn);
if (smult > 1) *scn = scaleImage(scn,smult);
} else {
int rows = h/numprocs;
subscn = new RGBApixmap(rows,w);
raytrace(rows, subscn);
if (smult > 1) *subscn = scaleImage(subscn,smult);
if (pid == MASTER) scn = new RGBApixmap(h/smult,w/smult);
MPI_Gather(subscn,rows/smult*w,MPI_BYTE,scn,rows/smult*w,MPI_BYTE,MASTER,MPI_COMM_WORLD);
}
if (pid == MASTER) {
initGlut(argc, argv);
glutMainLoop();
}
MPI_Finalize();
return 0;
}
編輯:
我已經解決了問題,下面貼更新的代碼:
int main(int argc, char **argv) {
clock_t total_time = clock(), otime;
init_MPI(argc, argv);
glutInit(&argc,argv);
bool OK = get_params(argc, argv);
if (buildScene(scene, cam) == -1) { MPI_Abort(MPI_COMM_WORLD,rc); exit(1); }
samples = samples > 0 ? whitted ? 1 : samples : 1;
int rows = h/numprocs;
subscn = new RGBApixmap(rows,w);
raytrace(rows, subscn);
MPI_Barrier(MPI_COMM_WORLD); /* Synchronize all processes */
if (smult > 1) *subscn = scaleImage(subscn,smult);
MPI_Barrier(MPI_COMM_WORLD); /* Synchronize all processes */
int nElts = subscn->getWidth()*subscn->getHeight();
RGBA *subscnpix, *scnpix;
subscnpix = subscn->getPixs();
scnpix = (RGBA*)malloc(sizeof(RGBA)*((w/smult)*(h/smult)));
MPI_Datatype pixel;
MPI_Type_contiguous(4,MPI_UNSIGNED_CHAR,&pixel);
MPI_Type_commit(&pixel);
MPI_Gather(subscnpix,nElts,pixel,scnpix,nElts,pixel,MASTER,MPI_COMM_WORLD);
scn = new RGBApixmap(h/smult,w/smult,scnpix);
MPI_Type_free(&pixel);
MPI_Barrier(MPI_COMM_WORLD); /* Synchonize all processes */
if (pid == MASTER) {
initGlut(argc, argv);
glutMainLoop();
}
MPI_Finalize();
return 0;
}
對'MPI_Barrier'的三次調用是不必要的。你的進程只交換'MPI_Gather'調用中的數據,這個數據是集體的,即它將等待所有進程在根目錄完成之前檢入。另外,不要使用'clock()',因爲這是非常不可移植的,它不會測量掛鐘時間(在Unix上它測量的CPU時間與在Windows上測量的時間不同)。改用'MPI_Wtime()'。 –
我知道我對'MPI_Barrier'的調用是不必要的,當我將文件作爲調試步驟寫入時,我無法使MPI_Gather工作。我對'clock()'的調用是重寫OpenMPI代碼之前的遺留問題,而且我更擔心到現在爲止讓MPI_Gather工作而不是將它們更改爲'MPI_Wtime()',但感謝您指出。 – RevanProdigalKnight