2017-06-13 83 views
-3

我打算使用cuda sample中的grabcutNPP來加速圖像處理。原始示例代碼是爲FIBITMAP實現的,但我的輸入/輸出類型將是Mat。cudaMemcpy2D到墊子

我已經想通了大部分的代碼,但卡在cudaMemcpyDeviceToHost一步... ...作爲一個初學者CUDA,我沒有任何理由爲什麼它總是在這一步

這裏是我的代碼部分停止:

void grabcutGPU(Mat& _src, Mat& _dst, Rect _srcRect){ 

GrabCut *grabcut; 
const size_t width = _src.rows; 
const size_t height = _src.cols; 
size_t image_pitch; 
size_t result_pitch; 
size_t trimap_pitch; 
uchar4 *gpu_src, *gpu_dst; 
unsigned char *d_trimap; 
NppiRect rect; 


// rect to nppirect 
rect.x = _srcRect.x; 
rect.y = _srcRect.y; 
rect.width = _srcRect.width; 
rect.height = _srcRect.height; 


//melloc for src_image 
checkCudaErrors(cudaMallocPitch(&gpu_src, &image_pitch, width * sizeof(uchar4), height)); 
checkCudaErrors(cudaMemcpy2D(gpu_src, image_pitch, _src.ptr<uchar4>(), width * sizeof(uchar4), width * sizeof(uchar4), height, cudaMemcpyHostToDevice)); 
// melloc foe rect 
checkCudaErrors(cudaMallocPitch(&d_trimap, &trimap_pitch, width, height)); 


// Setup GrabCut 
grabcut = new GrabCut(gpu_src, (int)image_pitch, d_trimap, (int)trimap_pitch, width, height); 
//rect to memory 
checkCudaErrors(TrimapFromRect(d_trimap, (int)trimap_pitch, rect, width, height)); 

//grabcut segmentation 
grabcut->computeSegmentationFromTrimap(); 

//melloc for dst_image 
checkCudaErrors(cudaMallocPitch(&gpu_dst, &result_pitch, width * 4, height)); 
//GPU process 
checkCudaErrors(ApplyMatte(2, gpu_dst, (int)result_pitch, gpu_src, (int)image_pitch, grabcut->getAlpha(), grabcut->getAlphaPitch(), width, height)); 
size_t output_pitch = result_pitch; 

//send result to dst 
checkCudaErrors(cudaMemcpy2D(_dst.ptr(), (int)output_pitch, gpu_dst, result_pitch, width * 4, height, cudaMemcpyDeviceToHost)); 

delete grabcut; 
checkCudaErrors(cudaDeviceSynchronize(), "Kernel Launch Failed"); 
checkCudaErrors(cudaFree(gpu_src), "CUDA Free Failed"); 
checkCudaErrors(cudaFree(gpu_dst)); 
checkCudaErrors(cudaFree(d_trimap), "CUDA Free Failed");} 
+1

是你的圖像像素類型uchar4?標準是Vec3b,它應該等於uchar3。 – Micka

+1

寬度是cols,高度是行! – Micka

+0

因爲有一個用於抓取的alpha通道,所以這是我使用uchar4的原因。這是否意味着我無法將結果傳遞給Mat?或者我應該做一些額外的處理? – Yisin

回答

0

這個問題已解決。

首先,行和列被粗心弄錯了。

然後,輸入墊是3個通道,但此功能需要4個通道才能通過結果。它可以通過轉換顏色類型來解決。

感謝Micka,或者我可能永遠不會注意到頻道問題。