Udacity並行編程，未指定啓動失敗cudaGetLastError（）

我想完成Udacity課程並行編程的作業＃2。我遇到了一個我無法解決的CUDA錯誤。當我啓動一個內核時，該錯誤會被嵌入，該內核旨在將格式爲「RGBRGBRGB」的圖像分隔爲三個單獨的「RRR」「GGG」和「BBB」陣列。看到錯誤「未指定的啓動失敗」並沒有給我任何具體的繼續我不知道如何解決我的問題。Udacity並行編程，未指定啓動失敗cudaGetLastError（）

這是調用來啓動整個過程的「主」功能。遇到錯誤後，我將剩下的工作留給了其他人，以便稍後查找。

void your_gaussian_blur(const uchar4 * const h_inputImageRGBA, uchar4 * const d_inputImageRGBA, uchar4* const d_outputImageRGBA, const size_t numRows, const size_t numCols, 
         unsigned char *d_redBlurred, 
         unsigned char *d_greenBlurred, 
         unsigned char *d_blueBlurred, 
         const int filterWidth) 
{ 

    // Maximum number of threads per block = 512; do this 
    // to keep this compatable with CUDa 5 and lower 
    // MAX > threadsX * threadsY * threadsZ 
    int MAXTHREADSx = 16; 
    int MAXTHREADSy = 16; // 16 x 16 x 1 = 512 
    // We want to fill the blocks so we don't waste this blocks threads 
    // I wonder if blocks can intermix in a physical core? 
    // Either way this method makes things "clean"; one thread per px 
    int nBlockX = numCols/MAXTHREADSx + 1; 
    int nBlockY = numRows/MAXTHREADSy + 1; 

    const dim3 blockSize(MAXTHREADSx, MAXTHREADSy, 1); 
    const dim3 gridSize(nBlockX, nBlockY, 1); 

    separateChannels<<<gridSize, blockSize>>>(
     h_inputImageRGBA, 
     numRows, 
     numCols, 
     d_red, 
     d_green, 
     d_blue); 

    // Call cudaDeviceSynchronize(), then call checkCudaErrors() immediately after 
    // launching your kernel to make sure that you didn't make any mistakes. 
    cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());

這裏是功能separateChannels

//This kernel takes in an image represented as a uchar4 and splits 
//it into three images consisting of only one color channel each 
__global__ 
void separateChannels(const uchar4* const inputImageRGBA, 
           int numRows, 
           int numCols, 
           unsigned char* const redChannel, 
           unsigned char* const greenChannel, 
           unsigned char* const blueChannel) 
{ 
    //const int2 thread_2D_pos = make_int2(blockIdx.x * blockDim.x + threadIdx.x, blockIdx.y * blockDim.y + threadIdx.y); 
    const int col = blockIdx.x * blockDim.x + threadIdx.x; 
    const int row = blockIdx.y * blockDim.y + threadIdx.y; 

    //if (thread_2D_pos.x >= numCols || thread_2D_pos.y >= numRows) 
    // return; 
    if (col >= numCols || row >= numRows) 
     return; 

    //const int thread_1D_pos = thread_2D_pos.y * numCols + thread_2D_pos.x; 
    int arrayPos = row * numCols + col; 

    uchar4 rgba = inputImageRGBA[arrayPos]; 
    redChannel[arrayPos] = rgba.x; 
    greenChannel[arrayPos] = rgba.y; 
    blueChannel[arrayPos] = rgba.z; 
}

我想我把必要的話，請讓我知道如果沒有。

來源

2014-09-21 KDecker

沒有看到剩下的代碼我無法確定，但是我相信你發送的指針指向主機內存作爲cuda內核的參數 - 這不是一件好事。在內核啓動時，您發送的是h_inputImageRGBA，而我相信您想發送d_inputImageRGBA。

通常，h_前綴代表主機內存，而d_代表設備。

來源

2014-09-21 21:05:45

是的，這證明是問題。我只是來這裏說的。謝謝！ – KDecker 2014-09-21 21:13:59

Udacity並行編程，未指定啓動失敗cudaGetLastError（）

回答

相關問題