1
cudaMemcpy2D
不會複製我期望的內容。在閱讀有關cudaMallocPitch的手冊後,我嘗試編寫一些代碼來了解發生了什麼。但是,我遇到了問題。CUDA使用cudaMemcpy2D將數組從設備複製到主機
我做了簡單的程序是這樣的:
int main()
{
double *d_A;
size_t d_pitch;
cudaMallocPitch((void**)&d_A, &d_pitch, sizeof(double) * SIZE, SIZE);
dim3 blocks(4, 4);
dim3 threads(16, 16);
doStuff<<<blocks, threads>>>(d_A, d_pitch);
double *A;
size_t pitch = sizeof(double) * SIZE;
A = (double*)malloc(sizeof(double) * SIZE * SIZE);
cudaMemcpy2D(A, pitch, d_A, d_pitch, sizeof(double) * SIZE, SIZE, cudaMemcpyDeviceToHost);
for (int i = 0; i < SIZE; i++) {
for (int j = 0; j < SIZE; j++) printf("%f ", A[sizeof(double) * i + j]);
printf("\n");
}
}
和doStuff
是:
__global__ void doStuff(double *d_A, size_t d_pitch)
{
unsigned int i = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int j = blockIdx.y * blockDim.y + threadIdx.y;
double *target = ((double*)(((char*)d_A) + (d_pitch * i))) + j;
if (i < SIZE && j < SIZE)
*target = (i + 1) * (j + 1) + 0.0;
}
所以doStuff
是一樣d_A[i][j] = (i+1)*(j+1)
。如果SIZE
是5,我期望的是:
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
雙精度。然而,當我編譯和運行,我得到:
1 2 3 4 5
8 10 3 6 9
8 12 16 20 5
25 0 0 0 0
0 0 0 0 0
看來,對於每一行,cudaMemcpy2D
覆蓋以前的數據。我試圖找到改變音高和寬度的問題,但我不能。
那麼我的代碼是怎麼回事?
哇......真的很尷尬:(你說得對,其實我還有一個問題,但它不關心這個問題,所以我寧願再問一次,謝謝你注意我的錯誤:D – kasty