2016-03-23 125 views
-1

我剛開始通過Udacity學習CUDA編程。即使試圖使用動態共享memeory,我也遇到了以下錯誤。CUDA - 動態共享內存觸發器thrust :: system :: system_error

CUDA error at: main.cpp:55 
invalid argument cudaGetLastError() 
terminate called after throwing an instance of thrust::system::system_error' 
what(): unload of CUDA runtime failed 

We are unable to execute your code. Did you set the grid and/or block size correctly? 

我搜索了很多,但仍然沒有線索,哪裏出了問題。有趣的是,如果我將最後兩行更改爲

compact_kernel<<<numBlocks, numThreadsPerBlock, sizeof(int)*1000>>>(d_inputVals, d_inputPos, d_outputVals, d_outputPos, numElems, 0); 
    compact_kernel<<<numBlocks, numThreadsPerBlock, sizeof(int)*1000>>>(d_inputVals, d_inputPos, &d_outputVals[numElems/2], &d_outputPos[numElems/2], numElems, 1); 

,運行代碼時未引發錯誤。然而,它沒有意義,因爲動態內存分配的空間不應該侷限於常量。也許這不是我的代碼,但Udacity的設置?我寫的代碼如下。任何幫助將不勝感激。

__global__ void compact_kernel(unsigned int* const d_inputVals, 
    unsigned int* const d_inputPos, 
    unsigned int* const d_outputVals, 
    unsigned int* const d_outputPos, 
    const size_t numElems, 
    const size_t refBit) 
{ 
    const size_t tid = blockIdx.x * blockDim.x + threadIdx.x; 

    // predicate 
    const bool predicate = (d_inputVals[tid] & 1) == refBit; 
    extern __shared__ int s[]; 
} 

void your_sort(unsigned int* const d_inputVals, 
    unsigned int* const d_inputPos, 
    unsigned int* const d_outputVals, 
    unsigned int* const d_outputPos, 
    const size_t numElems) 
{ 
    const size_t numBlocks = numElems/512; 
    const size_t numThreadsPerBlock = 256; 
    compact_kernel<<<numBlocks, numThreadsPerBlock, sizeof(int)*numElems>>>(d_inputVals, d_inputPos, d_outputVals, d_outputPos, numElems, 0); 
    compact_kernel<<<numBlocks, numThreadsPerBlock, sizeof(int)*numElems>>>(d_inputVals, d_inputPos, &d_outputVals[numElems/2], &d_outputPos[numElems/2], numElems, 1); 

}`

編輯: 爲numElems的值是220480.是這個數字太大了動態內存分配?

+1

'numElems'的值是什麼? – talonmies

+0

共享內存限制爲每個線程塊48 KB。您的電話號碼超過了此限制 – havogt

+1

@havogt非常感謝。這就對了。 :D你可以發表你的評論作爲答案嗎? – Maverobot

回答