2014-05-12 91 views
4

我寫了我這樣的示例代碼。CUDA雙指針內存拷貝

int ** d_ptr; 
cudaMalloc((void**)&d_ptr, sizeof(int*)*N); 

int* tmp_ptr[N]; 
for(int i=0; i<N; i++) 
    cudaMalloc((void**)&tmp_ptr[i], sizeof(int)*SIZE); 
cudaMemcpy(d_ptr, tmp_ptr, sizeof(tmp_ptr), cudaMemcpyHostToDevice); 

而且這段代碼運行良好,但內核啓動後我無法收到結果。

int* Mtx_on_GPU[N]; 
cudaMemcpy(Mtx_on_GPU, d_ptr, sizeof(int)*N*SIZE, cudaMemcpyDeviceToHost); 

此時發生段錯誤錯誤。但我不知道我錯了什麼。

int* Mtx_on_GPU[N]; 
for(int i=0; i<N; i++) 
    cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost); 

此代碼也有同樣的錯誤。

我認爲我的代碼當然有一些錯誤,但在整個白天都找不到它。

給我一些建議。

回答

5

在最後一行

cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost); 

您試圖將數據從設備複製到主機(注:我假設你分配給Mtx_on_GPU指針主機內存!)

然而, 指針存儲在設備存儲器中,因此您無法直接從主機端訪問。該行應

cudaMemcpy(Mtx_on_GPU[i], temp_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost); 

這可能會清晰一些使用 「過於複雜的」 變量名時:

int ** devicePointersStoredInDeviceMemory; 
cudaMalloc((void**)&devicePointersStoredInDeviceMemory, sizeof(int*)*N); 

int* devicePointersStoredInHostMemory[N]; 
for(int i=0; i<N; i++) 
    cudaMalloc((void**)&devicePointersStoredInHostMemory[i], sizeof(int)*SIZE); 

cudaMemcpy(
    devicePointersStoredInDeviceMemory, 
    devicePointersStoredInHostMemory, 
    sizeof(int*)*N, cudaMemcpyHostToDevice); 

// Invoke kernel here, passing "devicePointersStoredInDeviceMemory" 
// as an argument 
... 

int* hostPointersStoredInHostMemory[N]; 
for(int i=0; i<N; i++) { 
    int* hostPointer = hostPointersStoredInHostMemory[i]; 
    // (allocate memory for hostPointer here!) 

    int* devicePointer = devicePointersStoredInHostMemory[i]; 

    cudaMemcpy(hostPointer, devicePointer, sizeof(int)*SIZE, cudaMemcpyDeviceToHost); 
} 

編輯迴應評論:

d_ptr是「一個指針數組」。但是這個數組的內存分配爲cudaMalloc。這意味着它位於設備上。與此相反,在int* Mtx_on_GPU[N];中,您正在「分配」主機內存中的N個指針。而不是指定數組大小,您也可以使用malloc

int** pointersStoredInDeviceMemory; 
cudaMalloc((void**)&pointersStoredInDeviceMemory, sizeof(int*)*N); 

int** pointersStoredInHostMemory; 
pointersStoredInHostMemory = (void**)malloc(N * sizeof(int*)); 

// This is not possible, because the array was allocated with cudaMalloc: 
int *pointerA = pointersStoredInDeviceMemory[0]; 

// This is possible because the array was allocated with malloc:  
int *pointerB = pointersStoredInHostMemory[0]; 

它可能是一個有點腦子扭跟蹤

  • 當指針存儲
  • 內存類型:當你比較下面的分配可能會清晰
  • 存儲器的該指針是類型指向

,但幸運的是,它幾乎不會超過2個indirections。

+0

天啊。它工作得很清楚。謝謝!!但我有一些疑點。 d_ptr和tmp_ptr都使用cudaMalloc,但爲什麼我可以訪問tmp_ptr,但無法訪問d_ptr? – Umbrella

+0

@傘我添加了一個編輯,也許它現在變得更清晰了 – Marco13