兩個連續的「使用cudaMallocPitch」使代碼失敗

我寫了一個簡單的CUDA代碼如下：兩個連續的「使用cudaMallocPitch」使代碼失敗

//Allocate the first 2d array "deviceArray2DInput" 
if(cudaMallocPitch((Float32**) &deviceArray2DInput, &devicePitch, sizeof(Float32)*deviceColNumber,deviceRowNumber) == cudaErrorMemoryAllocation){ 
    return -1; 
} 

//Allocate the second 2d array "deviceArray2DOutput". It was suppose to hold the output of some process. 
if(cudaMallocPitch((Float32**) &deviceArray2DOutput, &devicePitch,sizeof(Float32)*deviceRowNumber,deviceColNumber) == cudaErrorMemoryAllocation){ 
    return -1; 
} 

//Copy data from "hostArrayR" to "deviceArray2DInput" (#1) 
cudaMemcpy2D(deviceArray2DInput,devicePitch,hostArrayR,sizeof(Float32)*colNumber,sizeof(Float32)*deviceColNumber,deviceRowNumber,cudaMemcpyHostToDevice); 

//Clean the top 10000 elements in "hostArrayR" for verification. 
for(int i = 0; i < 10000; ++i){ 
    hostArrayR[i] = 0; 
} 

//Copy data back from "deviceArray2DInput" to "hostArrayR"(#2) 
cudaMemcpy2D(hostArrayR,sizeof(Float32)*colNumber,deviceArray2DInput,devicePitch,sizeof(Float32)*deviceColNumber,deviceRowNumber,cudaMemcpyDeviceToHost);

我註釋掉第二分配塊，代碼運行良好。它將主機數組「hostArrayR」中的數據複製到設備陣列「deviceArray2DInput」並將其複製回來。 但是，如果兩個分配塊都存在，則被複制的「hostArrayR」爲空（沒有數據從設備中被反回）。

我確定數據在第（1）行的「hostArrayR」中，但行（＃2）沒有數據。我清理了第一個10000個元素（比數組大小要小得多），以確保數據不會回來。

我在Visual Studio 2010上使用Nvidia Nsight 2.2。陣列大小爲1024x768，我使用的是浮動32位數據。我的GPU卡是GTX570。似乎沒有內存分配錯誤（或者在複製東西之前代碼會返回）。

我沒有嘗試「cudaMalloc（）」，因爲我更喜歡使用「cudaMallocPitch（）」來進行內存對齊。

來源

2012-10-12 DFTandFFT

你錯誤檢查看起來對我來說很脆弱。如果返回「cudaErrorMemoryAllocation」以外的錯誤會怎麼樣？而'cudaMemcpy2D（）完全沒有錯誤檢查。我建議始終檢查所有返回碼是否等於'cudaSuccess'。 – tera

您應該檢查針對cudaSuccess的API調用，而不是一個特定的錯誤。
您應該檢查memcpys返回的錯誤值。
您在覆蓋第二個cudaMallocPitch()調用devicePitch時，陣列具有不同的形狀，因此可能會有不同的音高。

來源

2012-10-12 08:29:41 Tom

我重寫了代碼：（1）針對「cudaSuccess」（2）使用兩個單獨的「devicePitch」（我認爲你是對的，「devicePitch」被第二個分配塊覆蓋，導致內存複製失敗，這是沒有檢查我的原始代碼）。現在它運作良好。謝謝。 – DFTandFFT

兩個連續的「使用cudaMallocPitch」使代碼失敗

回答

相關問題