CUDA雙指針內存拷貝

我寫了我這樣的示例代碼。CUDA雙指針內存拷貝

int ** d_ptr; 
cudaMalloc((void**)&d_ptr, sizeof(int*)*N); 

int* tmp_ptr[N]; 
for(int i=0; i<N; i++) 
    cudaMalloc((void**)&tmp_ptr[i], sizeof(int)*SIZE); 
cudaMemcpy(d_ptr, tmp_ptr, sizeof(tmp_ptr), cudaMemcpyHostToDevice);

而且這段代碼運行良好，但內核啓動後我無法收到結果。

int* Mtx_on_GPU[N]; 
cudaMemcpy(Mtx_on_GPU, d_ptr, sizeof(int)*N*SIZE, cudaMemcpyDeviceToHost);

此時發生段錯誤錯誤。但我不知道我錯了什麼。

int* Mtx_on_GPU[N]; 
for(int i=0; i<N; i++) 
    cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

此代碼也有同樣的錯誤。

我認爲我的代碼當然有一些錯誤，但在整個白天都找不到它。

給我一些建議。

來源

2014-05-12 Umbrella

在最後一行

cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

您試圖將數據從設備複製到主機（注：我假設你分配給Mtx_on_GPU指針主機內存！）

然而，指針存儲在設備存儲器中，因此您無法直接從主機端訪問。該行應

cudaMemcpy(Mtx_on_GPU[i], temp_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

這可能會清晰一些使用「過於複雜的」變量名時：

int ** devicePointersStoredInDeviceMemory; 
cudaMalloc((void**)&devicePointersStoredInDeviceMemory, sizeof(int*)*N); 

int* devicePointersStoredInHostMemory[N]; 
for(int i=0; i<N; i++) 
    cudaMalloc((void**)&devicePointersStoredInHostMemory[i], sizeof(int)*SIZE); 

cudaMemcpy(
    devicePointersStoredInDeviceMemory, 
    devicePointersStoredInHostMemory, 
    sizeof(int*)*N, cudaMemcpyHostToDevice); 

// Invoke kernel here, passing "devicePointersStoredInDeviceMemory" 
// as an argument 
... 

int* hostPointersStoredInHostMemory[N]; 
for(int i=0; i<N; i++) { 
    int* hostPointer = hostPointersStoredInHostMemory[i]; 
    // (allocate memory for hostPointer here!) 

    int* devicePointer = devicePointersStoredInHostMemory[i]; 

    cudaMemcpy(hostPointer, devicePointer, sizeof(int)*SIZE, cudaMemcpyDeviceToHost); 
}

編輯迴應評論：

的d_ptr是「一個指針數組」。但是這個數組的內存分配爲cudaMalloc。這意味着它位於設備上。與此相反，在int* Mtx_on_GPU[N];中，您正在「分配」主機內存中的N個指針。而不是指定數組大小，您也可以使用malloc。

int** pointersStoredInDeviceMemory; 
cudaMalloc((void**)&pointersStoredInDeviceMemory, sizeof(int*)*N); 

int** pointersStoredInHostMemory; 
pointersStoredInHostMemory = (void**)malloc(N * sizeof(int*)); 

// This is not possible, because the array was allocated with cudaMalloc: 
int *pointerA = pointersStoredInDeviceMemory[0]; 

// This is possible because the array was allocated with malloc:  
int *pointerB = pointersStoredInHostMemory[0];

它可能是一個有點腦子扭跟蹤

當指針存儲
存儲器的該指針是類型指向

，但幸運的是，它幾乎不會超過2個indirections。

來源

2014-05-12 13:11:32 Marco13

天啊。它工作得很清楚。謝謝！！但我有一些疑點。 d_ptr和tmp_ptr都使用cudaMalloc，但爲什麼我可以訪問tmp_ptr，但無法訪問d_ptr？ – Umbrella

@傘我添加了一個編輯，也許它現在變得更清晰了 – Marco13

CUDA雙指針內存拷貝

回答

相關問題