2013-05-01 91 views
1

我正在研究GPU/OpenCL NBody代碼。我使用AMD APP SDK的OpenGL渲染粒子位置。運行代碼時,我有隨機分段錯誤。GPU/OpenCL/OpenGL代碼隨機分段故障

總而言之,我有一個GLWidget,我在其中進行OpenGL渲染。一旦生成了初始位置,我將它們渲染到這個GLWidget中。之後,我運行模擬,並在每一步計算下一個位置並在GLwidget中顯示它們。我的問題是,有時,如果我點擊「生成初始條件」參數GUI的按鈕模擬運行時,我有一個分段錯誤:

這裏的回溯:

Program received signal SIGSEGV, Segmentation fault. 
0x00007ffff4a46cd7 in memcpy() from /lib/libc.so.6 
(gdb) bt 
#0 0x00007ffff4a46cd7 in memcpy() from /lib/libc.so.6 
#1 0x00007fffeda2da64 in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so 
#2 0x00007fffedbba74a in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so 
#3 0x00007fffedbba9af in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so 
#4 0x00007fffed9c56e4 in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so 
#5 0x00007fffed17371d in ??() from /usr/lib/x86_64-linux-gnu/dri/fglrx_dri.so 
#6 0x000000000040b185 in GLWidget::createVBO()() 
#7 0x000000000040b3c9 in GLWidget::draw()() 
#8 0x000000000040c36d in GLWidget::processCurrent()() 
... 

這裏的createVBO常規:

void GLWidget::createVBO() 
{ 
    GLuint vbo; 
    int memSize = sizeof(cl_double4) * 4 * Galaxy->getNumParticles(); 
    glGenBuffers(1, &vbo); 
    glBindBuffer(GL_ARRAY_BUFFER, vbo); 
    glBufferData(GL_ARRAY_BUFFER, memSize, Galaxy->pos, GL_DYNAMIC_DRAW); 
} 

的段錯誤發生在glBufferData(GL_ARRAY_BUFFER, memSize, Galaxy->pos, GL_DYNAMIC_DRAW);

我不明白爲什麼這個Happe的納秒。當我按下「生成IC」按鈕時,我刪除分配的Galaxy->pos數組並創建一個新數組。

這是我在 「生成IC」 常規做:

//Clean Galaxy already existing 
    if (parent->widget_2->isGalaxyExist) 
    { 
    if (parent->widget_2->animation) 
     parent->resetSimu(); 
    parent->widget_2->Galaxy->cleanup(); 
    } 

cleanup程序(這裏我刪除pos陣列):

int NBody::cleanup() 
{ 
    if (glEvent) 
    clReleaseEvent(glEvent); 

    // Releases OpenCL resources (Context, Memory etc.) 
    cl_int status; 

    if (hasRunKernel) 
    { 
    status = clFinish(commandQueue); 
    CHECK_OPENCL_ERROR(status, "clFinish failed.(commandQueue)"); 

    status = clReleaseKernel(kernel); 
    CHECK_OPENCL_ERROR(status, "clReleaseKernel failed.(kernel)"); 

    status = clReleaseProgram(program); 
    CHECK_OPENCL_ERROR(status, "clReleaseProgram failed.(program)"); 

    status = clReleaseMemObject(currPos); 
    CHECK_OPENCL_ERROR(status, "clReleaseMemObject failed.(currPos)"); 

    status = clReleaseMemObject(currVel); 
    CHECK_OPENCL_ERROR(status, "clReleaseMemObject failed.(currVel)"); 

    status = clReleaseMemObject(newPos); 
    CHECK_OPENCL_ERROR(status, "clReleaseMemObject failed.(newPos)"); 

    status = clReleaseMemObject(newVel); 
    CHECK_OPENCL_ERROR(status, "clReleaseMemObject failed.(newVel)"); 

    status = clReleaseCommandQueue(commandQueue); 
    CHECK_OPENCL_ERROR(status, "clReleaseCommandQueue failed.(commandQueue)"); 

    status = clReleaseContext(context); 
    CHECK_OPENCL_ERROR(status, "clReleaseContext failed.(context)"); 

    hasRunKernel = false; 
    } 

    // Release program resources 
    delete [] pos; 
    delete [] vel; 
    delete [] initPos; 
    delete [] initVel; 
    delete [] devices; 
    // Delete current instance 
    delete this; 

    return NBODY_SUCCESS; 
} 

乍一看,你可以看到什麼是錯的或者給我一個關於這段錯誤的線索。最令人討厭的是,錯誤是隨機發生的,而不是每次執行。

回答

1

這個計算是否正確?

int memSize = sizeof(cl_double4) * 4 * Galaxy->getNumParticles(); 

特別是「* 4」:sizeof(cl_double4)將已經考慮矢量的四個元素。

1

像這樣的崩潰表示在通過glBufferData OpenGL API函數調用的驅動程序代碼中出現了越界訪問。檢查傳遞給glBufferData的參數是否正確,即給予glBufferData讀取的長度是否在作爲數據參數傳遞的內存範圍內。