我正在研究GPU/OpenCL NBody代碼。我使用AMD APP SDK的OpenGL渲染粒子位置。運行代碼時,我有隨機分段錯誤。GPU/OpenCL/OpenGL代碼隨機分段故障
總而言之,我有一個GLWidget,我在其中進行OpenGL渲染。一旦生成了初始位置,我將它們渲染到這個GLWidget中。之後,我運行模擬,並在每一步計算下一個位置並在GLwidget中顯示它們。我的問題是,有時,如果我點擊「生成初始條件」參數GUI的按鈕模擬運行時,我有一個分段錯誤:
這裏的回溯:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4a46cd7 in memcpy() from /lib/libc.so.6
(gdb) bt
#0 0x00007ffff4a46cd7 in memcpy() from /lib/libc.so.6
#1 0x00007fffeda2da64 in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so
#2 0x00007fffedbba74a in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so
#3 0x00007fffedbba9af in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so
#4 0x00007fffed9c56e4 in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so
#5 0x00007fffed17371d in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so
#6 0x000000000040b185 in GLWidget::createVBO()()
#7 0x000000000040b3c9 in GLWidget::draw()()
#8 0x000000000040c36d in GLWidget::processCurrent()()
...
這裏的createVBO
常規:
void GLWidget::createVBO()
{
GLuint vbo;
int memSize = sizeof(cl_double4) * 4 * Galaxy->getNumParticles();
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, memSize, Galaxy->pos, GL_DYNAMIC_DRAW);
}
的段錯誤發生在glBufferData(GL_ARRAY_BUFFER, memSize, Galaxy->pos, GL_DYNAMIC_DRAW);
我不明白爲什麼這個Happe的納秒。當我按下「生成IC」按鈕時,我刪除分配的Galaxy->pos
數組並創建一個新數組。
這是我在 「生成IC」 常規做:
//Clean Galaxy already existing
if (parent->widget_2->isGalaxyExist)
{
if (parent->widget_2->animation)
parent->resetSimu();
parent->widget_2->Galaxy->cleanup();
}
與cleanup
程序(這裏我刪除pos
陣列):
int NBody::cleanup()
{
if (glEvent)
clReleaseEvent(glEvent);
// Releases OpenCL resources (Context, Memory etc.)
cl_int status;
if (hasRunKernel)
{
status = clFinish(commandQueue);
CHECK_OPENCL_ERROR(status, "clFinish failed.(commandQueue)");
status = clReleaseKernel(kernel);
CHECK_OPENCL_ERROR(status, "clReleaseKernel failed.(kernel)");
status = clReleaseProgram(program);
CHECK_OPENCL_ERROR(status, "clReleaseProgram failed.(program)");
status = clReleaseMemObject(currPos);
CHECK_OPENCL_ERROR(status, "clReleaseMemObject failed.(currPos)");
status = clReleaseMemObject(currVel);
CHECK_OPENCL_ERROR(status, "clReleaseMemObject failed.(currVel)");
status = clReleaseMemObject(newPos);
CHECK_OPENCL_ERROR(status, "clReleaseMemObject failed.(newPos)");
status = clReleaseMemObject(newVel);
CHECK_OPENCL_ERROR(status, "clReleaseMemObject failed.(newVel)");
status = clReleaseCommandQueue(commandQueue);
CHECK_OPENCL_ERROR(status, "clReleaseCommandQueue failed.(commandQueue)");
status = clReleaseContext(context);
CHECK_OPENCL_ERROR(status, "clReleaseContext failed.(context)");
hasRunKernel = false;
}
// Release program resources
delete [] pos;
delete [] vel;
delete [] initPos;
delete [] initVel;
delete [] devices;
// Delete current instance
delete this;
return NBODY_SUCCESS;
}
乍一看,你可以看到什麼是錯的或者給我一個關於這段錯誤的線索。最令人討厭的是,錯誤是隨機發生的,而不是每次執行。