CUDA - 內核中的malloc（compute_50，sm_50）

我在使用CUDA Memory Checker運行程序時遇到問題。在stackoverflow上的其他線程中，在內核中使用malloc的主要問題是「compute_50，sm_50」未正確設置。這裏的代碼編譯，所以這不是問題。CUDA - 內核中的malloc（compute_50，sm_50）

現在問題已解決，但我不明白爲什麼新代碼解決了問題。 我的問題是：爲什麼現在工作？

舊代碼：

__device__ unsigned int draw_active_levels(curandState * localState,const int num_levels_max){ 
    unsigned int return_value = 0; 
    float draw; 
    draw = curand_uniform(localState); 
    int num_active_levels = floorf(draw * (num_levels_max - 1)) + 1; 


    double * arrLevelWeights = (double*) malloc((num_levels_max+1) * sizeof(double)); 
    arrLevelWeights[num_levels_max]=0.0; //<--------Error on this line 
    double level_weights = 1.0/num_levels_max; 
    for(int i=0; i<num_levels_max; i++){ 
     arrLevelWeights[i] = level_weights; 
    } 
    //... 
    //do some operations using arrLevelWeights 
    //.. 

    free(arrLevelWeights); 
    return return_value; 
}

錯誤與舊代碼：

Memory Checker detected 2 access violations. 
error = access violation on store (global memory) 
gridid = 198 
blockIdx = {1,0,0} 
threadIdx = {29,0,0} 
address = 0x00000020 
accessSize = 8

新代碼：我只是增加了幾行，以檢查是否malloc返回一個空指針。

__device__ unsigned int draw_active_levels(curandState * localState,const int num_levels_max){ 
    unsigned int return_value = 0; 
    float draw; 
    draw = curand_uniform(localState); 
    int num_active_levels = floorf(draw * (num_levels_max - 1)) + 1; 


    double * arrLevelWeights; 
    arrLevelWeights = (double*) malloc((num_levels_max+1) * sizeof(double)); 
    if(arrLevelWeights == NULL){ 
     printf("Error while dynamically allocating memory on device.\n"); //<--- this line is never called (I put a breakpoint on it) 
    } 
    arrLevelWeights[num_levels_max]=0.0; //<-------Error disapeared ! 
    double level_weights = 1.0/num_levels_max; 
    for(int i=0; i<num_levels_max; i++){ 
     arrLevelWeights[i] = level_weights; 
    } 
    //... 
    //do some operations using arrLevelWeights 
    //.. 

    free(arrLevelWeights); 
    return return_value; 
}

來源

2014-07-19 RemiDav

你可能會分配太多內存。設備堆的默認大小爲8 MB。 –

你是對的，它是關於內存空間，我錯過了一個完全不相關的代碼部分的免費（）。你想把它作爲答案，以便我可以接受它嗎？ – RemiDav

很明顯，您對代碼進行了其他更改。如果你添加一行代碼並且它從未被調用，那顯然不是問題。你的問題令人困惑。我很困惑這個問題的答案如何能夠真正回答這個問題，特別是如何添加永遠不會被調用的代碼行可以「解決」問題。 –

如果malloc回報NULL，它只是意味着你已經用完的設備堆空間具有默認情況下，大小爲8 MB。不過，我不知道如何添加永不執行的行可以解決問題。

正如您在評論中所說的那樣，由於在代碼中的其他位置缺少free，因此您的堆空間用完了，這就是爲什麼我建議您使用RAII（帶有自己的智能指針類）進行內存分配以避免出現此問題將來會出現一些問題。

來源

2014-07-19 18:11:19

CUDA - 內核中的malloc（compute_50，sm_50）

回答

相關問題