0
我被困在這樣一個愚蠢的問題,我想。這是一個測試內核正好看到atomicAdd工作:Cuda AtomicAdd不增量
__global__
void pixelcount_kernel(unsigned int * d_count,
const size_t numElems)
{
int myId = threadIdx.x + blockDim.x * blockIdx.x;
//avoid out of boundary access
if(myId > (numElems-1))
{return;
}
unsigned int inc=1;
atomicAdd(d_count, inc);
//debug code
printf("d_count: %d \n", *d_count);
}
,這是內存分配,初始化和內核調用:
unsigned int* d_count;
checkCudaErrors(cudaMalloc(&d_count, sizeof(unsigned int)));
checkCudaErrors(cudaMemset(d_count, 0, sizeof(unsigned int)));
pixelcount_kernel<<<gridSize, blockSize>>>(d_count, 10);
在輸出我沒有看到任何0增量到numElems(在這個調用中爲10),但是這個:
d_count: 10
d_count: 10
d_count: 10
d_count: 10
d_count: 10
d_count: 10
d_count: 10
d_count: 10
d_count: 10
d_count: 10
怎麼了? 謝謝 朱塞佩