正確使用cudaMalloc3D與cudaMemcpy

我想發送大小size的在每一維的3D陣列src，展平爲大小length = size * size * size的一維數組，爲一個內核，計算結果並將其存儲在dst。但是，最後，dst不正確地包含全部0。這裏是我的代碼：正確使用cudaMalloc3D與cudaMemcpy

int size = 256; 
int length = size * size * size; 
int bytes = length * sizeof(float); 

// Allocate source and destination arrays on the host and initialize source array 

float *src, *dst; 
cudaMallocHost(&src, bytes); 
cudaMallocHost(&dst, bytes); 
for (int i = 0; i < length; i++) { 
    src[i] = i; 
} 

// Allocate source and destination arrays on the device 

struct cudaPitchedPtr srcGPU, dstGPU; 
struct cudaExtent extent = make_cudaExtent(size*sizeof(float), size, size); 
cudaMalloc3D(&srcGPU, extent); 
cudaMalloc3D(&dstGPU, extent); 

// Copy to the device, execute kernel, and copy back to the host 

cudaMemcpy(srcGPU.ptr, src, bytes, cudaMemcpyHostToDevice); 
myKernel<<<numBlocks, blockSize>>>((float *)srcGPU.ptr, (float *)dstGPU.ptr); 
cudaMemcpy(dst, dstGPU.ptr, bytes, cudaMemcpyDeviceToHost);

我已經離開了的cudaMallocHost()，cudaMalloc()和cudaMemcpy()爲清楚起見，我的錯誤檢查。無論如何這個代碼都不會觸發錯誤。

cudaMalloc3D()與cudaMemcpy()的正確用法是什麼？

請讓我知道我是否應該發佈內核的最小測試用例，或者如果問題可以在上面的代碼中找到。

來源

2013-05-15 1' '

考慮看看您可能會感興趣[這個提問/回答（http://stackoverflow.com/questions/16119943/how-and-當我應該使用pitched指針與cuda-api） –

謝謝，我已經偶然發現，這是非常有益的。 –

現在可以在[從cuda 3D內存複製到線性內存：複製數據不在我預期的地方]（http：// stackoverflow。COM /問題/ 16107480 /複製從 - CUDA-3D-存儲器到線性存儲器複製的數據 - 是 - 不其中-I-人口會/ 23052768＃23052768）。 – JackOLantern

編輯：程度取如果使用CUDA數組元素的數量，但實際上取的字節數，如果不使用CUDA陣列（例如存儲器用的cudaMalloc一些非陣列變體分配）

從the Runtime API CUDA documentation：

範圍字段定義元素中傳輸區域的尺寸。如果一個CUDA數組正在參與複製，則該數組的範圍將根據該數組的元素進行定義。如果沒有CUDA數組參與複製，則範圍在無符號字符

同樣的元素來定義，cudaMalloc3D返回投指針，這意味着它必須至少提供您的尺寸程度，但可能更多的對齊原因。訪問和複製設備內存時，必須考慮到這一點。見here有關cudaPitchedPtr結構

至於使用cudaMalloc3D與cudaMemcpy的文檔，你可能想看看使用cudaMemcpy3D（documentation here），它可能使你的生活更容易一點，採取主機的間距和考慮到設備內存。要使用cudaMemcpy3D，您必須創建一個cudaMemcpy3DParms結構以及相應的信息。它的成員有：

cudaArray_t dstArray 
struct cudaPos dstPos 
struct cudaPitchedPtr dstPtr 
struct cudaExtent extent 
enumcudaMemcpyKind kind 
cudaArray_t srcArray 
struct cudaPos srcPos 
struct cudaPitchedPtr srcPtr

，你必須指定的srcArray或srcPtr一個和dstArray或dstPtr之一。另外，文檔建議在使用它之前將結構初始化爲0，例如 cudaMemcpy3DParms myParms = {0};

而且，你可能有興趣在此other SO question

來源

2013-05-15 18:14:05 alrikai

我可以使用srcGPU作爲dstPtr，但我應該爲srcArray或srcPtr使用？我從float * src複製，它既不是cuda數組，也不是cuda pitched指針。 –

@ 1「」我想嘗試做一個'cudaPitchedPtr'你'src'指針，與步幅是大小爲您的寬度 – alrikai

好主意一樣，我想試試。然而，我沒有錯誤檢查內核本身，並且它給出了當前代碼的錯誤「無效參數」。爲什麼我不能將srcPtr.ptr和dstPtr.ptr傳遞給期望float *的內核？ –

正確使用cudaMalloc3D與cudaMemcpy

回答

相關問題