我正在嘗試將2d數組傳遞給內核,以便每個線程都可以訪問index = threadIdx.x +(blockIdx.x * blockDim.x),但我無法計算出如何做到這一點以及如何將數據複製回來。管理2D CUDA陣列
size_t pitch;
cudaMallocPitch(&d_array, &pitch, block_size * sizeof(int), num_blocks);
cudaMemset2D(d_array, pitch, 0, block_size * sizeof(int), num_blocks * sizeof(int));
kernel<<<grid_size, block_size>>>(d_array, pitch);
cudaMemcpy2D(h_array, pitch, d_array, pitch, block_size, num_blocks, cudaMemcpyDeviceToHost);
for (num_blocks)
for(block_size)
h_array[block][thread] should be 1
__global__ void kernel(int *array, int pitch) {
int *row = (int*)((char*)array + blockIdx.x * pitch);
row[threadIdx.x] = 1;
return;
}
我在做什麼錯,在這裏?
爲什麼要將數組轉換爲(char *)?這將導致一個錯誤的指針算術 – LarryPel
這就是它在這兩個問題中描述的: http://stackoverflow.com/questions/1047369/allocate-2d-array-on-device-memory-in-cuda http: //stackoverflow.com/questions/5029920/how-to-use-2d-arrays-in-cuda – user1743798
@LarryPel:不,它不會。間距以字節爲單位,並且需要指向字節大小的類型的指針才能正確執行指針計算。 – talonmies