2012-05-01 36 views
2

當我在一個循環中運行我的內核函數600多次(它不會崩潰,如果它是50倍左右),我的電腦崩潰(我必須手動重置它),而我我不確定是什麼原因造成的。CUDA崩潰的大數據集

我的主要情況如下:

int main() 
{ 
    int *seam = new int [image->height]; 
    int width = image->width; 
    int height = image->height; 

    int *fMC = (int*)malloc(width*height*sizeof(int*)); 
    int *fNew = (int*)malloc(width*height*sizeof(int*)); 

    for(int i=0;i<numOfSeams;i++) 
    { 
     seam = cpufindSeamV2(fMC,width,height,1); 

     fMC = kernel_shiftSeam(fMC,fNew,seam,width,height,nWidth,1); 

     for(int k=0;k<height;k++) 
     { 
      fMC[(nWidth-1)+width*k] = INT_MAX; 
     } 
    } 

和我的內核是:

int* kernel_shiftSeam(int *MCEnergyMat, int *newE, int *seam, int width, int height, int x, int direction) 
{ 
    //time measurement 
    float elapsed_time_ms = 0; 
    cudaEvent_t start, stop;  //threads per block 

    dim3 threads(16,16); 
    //blocks 
    dim3 blocks((width+threads.x-1)/threads.x, (height+threads.y-1)/threads.y); 

    //MCEnergy and Seam arrays on device 
    int *device_MC, *device_new, *device_Seam; 

    //MCEnergy and Seam arrays on host 
    int *host_MC, *host_new, *host_Seam; 


    //total number of bytes in array 
    int size = width*height*sizeof(int); 
    int seamSize; 



    if(direction == 1) 
    { 
     seamSize = height*sizeof(int); 
     host_Seam = (int*)malloc(seamSize); 
     for(int i=0;i<height;i++) 
      host_Seam[i] = seam[i]; 
    } 
    else 
    { 
     seamSize = width*sizeof(int); 
     host_Seam = (int*)malloc(seamSize); 
     for(int i=0;i<width;i++) 
      host_Seam[i] = seam[i]; 
    } 

    cudaMallocHost((void**)&host_MC, size); 
    cudaMallocHost((void**)&host_new, size); 

    host_MC = MCEnergyMat; 
    host_new = newE; 

    //allocate 1D flat array on device 
    cudaMalloc((void**)&device_MC, size); 
    cudaMalloc((void**)&device_new, size); 
    cudaMalloc((void**)&device_Seam, seamSize); 

    //copy host array to device 
    cudaMemcpy(device_MC, host_MC, size, cudaMemcpyHostToDevice); 
    cudaMemcpy(device_new, host_new, size, cudaMemcpyHostToDevice); 
    cudaMemcpy(device_Seam, host_Seam, seamSize, cudaMemcpyHostToDevice); 

    //measure start time for cpu calculations 
    cudaEventCreate(&start); 
    cudaEventCreate(&stop); 
    cudaEventRecord(start, 0); 



    //perform gpu calculations 
    if(direction == 1) 
    { 
     gpu_shiftSeam<<< blocks,threads >>>(device_MC, device_new, device_Seam, width, height, x); 
    } 

    //measure end time for cpu calcuations 
    cudaEventRecord(stop, 0); 
    cudaEventSynchronize(stop); 
    cudaEventElapsedTime(&elapsed_time_ms, start, stop); 

    execTime += elapsed_time_ms; 

    //copy out the results back to host 
    cudaMemcpy(newE, device_new, size, cudaMemcpyDeviceToHost); 

    //free memory 
    free(host_Seam); 
    cudaFree(host_MC); cudaFree(host_new); 
    cudaFree(device_MC); cudaFree(device_new); cudaFree(device_Seam); 

    //destroy event objects 
    cudaEventDestroy(start); cudaEventDestroy(stop); 

    return newE; 
} 

所以,我的程序崩潰時,我稱之爲「kernel_shiftSeam」多次,我也利用釋放的內存所以我不知道cudaFree是否存在內存泄漏問題。如果有人能指出我正確的方向,那將是很棒的。

+0

如果您運行程序但未調用內核,會發生什麼情況? –

+0

你想用這些線做什麼? host_MC = MCEnergyMat; host_new = newE; –

+0

該程序運行良好如果我沒有調用內核運行它。 – overloading

回答

1

可能是堆問題。嘗試將內核中的cudaFree語句重新排序爲LIFO。檢查包含堆/泄漏修復的任何新的CUDA驅動程序的發行說明。在Windows上,嘗試安裝進程資源管理器15.12或更新版本,因爲它顯示了GPU內存使用情況 - 並且易於發現漏洞堆。