2
我想完成Udacity課程並行編程的作業#2。我遇到了一個我無法解決的CUDA錯誤。當我啓動一個內核時,該錯誤會被嵌入,該內核旨在將格式爲「RGBRGBRGB」的圖像分隔爲三個單獨的「RRR」「GGG」和「BBB」陣列。看到錯誤「未指定的啓動失敗」並沒有給我任何具體的繼續我不知道如何解決我的問題。Udacity並行編程,未指定啓動失敗cudaGetLastError()
這是調用來啓動整個過程的「主」功能。遇到錯誤後,我將剩下的工作留給了其他人,以便稍後查找。
void your_gaussian_blur(const uchar4 * const h_inputImageRGBA, uchar4 * const d_inputImageRGBA, uchar4* const d_outputImageRGBA, const size_t numRows, const size_t numCols,
unsigned char *d_redBlurred,
unsigned char *d_greenBlurred,
unsigned char *d_blueBlurred,
const int filterWidth)
{
// Maximum number of threads per block = 512; do this
// to keep this compatable with CUDa 5 and lower
// MAX > threadsX * threadsY * threadsZ
int MAXTHREADSx = 16;
int MAXTHREADSy = 16; // 16 x 16 x 1 = 512
// We want to fill the blocks so we don't waste this blocks threads
// I wonder if blocks can intermix in a physical core?
// Either way this method makes things "clean"; one thread per px
int nBlockX = numCols/MAXTHREADSx + 1;
int nBlockY = numRows/MAXTHREADSy + 1;
const dim3 blockSize(MAXTHREADSx, MAXTHREADSy, 1);
const dim3 gridSize(nBlockX, nBlockY, 1);
separateChannels<<<gridSize, blockSize>>>(
h_inputImageRGBA,
numRows,
numCols,
d_red,
d_green,
d_blue);
// Call cudaDeviceSynchronize(), then call checkCudaErrors() immediately after
// launching your kernel to make sure that you didn't make any mistakes.
cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
這裏是功能separateChannels
//This kernel takes in an image represented as a uchar4 and splits
//it into three images consisting of only one color channel each
__global__
void separateChannels(const uchar4* const inputImageRGBA,
int numRows,
int numCols,
unsigned char* const redChannel,
unsigned char* const greenChannel,
unsigned char* const blueChannel)
{
//const int2 thread_2D_pos = make_int2(blockIdx.x * blockDim.x + threadIdx.x, blockIdx.y * blockDim.y + threadIdx.y);
const int col = blockIdx.x * blockDim.x + threadIdx.x;
const int row = blockIdx.y * blockDim.y + threadIdx.y;
//if (thread_2D_pos.x >= numCols || thread_2D_pos.y >= numRows)
// return;
if (col >= numCols || row >= numRows)
return;
//const int thread_1D_pos = thread_2D_pos.y * numCols + thread_2D_pos.x;
int arrayPos = row * numCols + col;
uchar4 rgba = inputImageRGBA[arrayPos];
redChannel[arrayPos] = rgba.x;
greenChannel[arrayPos] = rgba.y;
blueChannel[arrayPos] = rgba.z;
}
我想我把必要的話,請讓我知道如果沒有。
是的,這證明是問題。我只是來這裏說的。謝謝! – KDecker 2014-09-21 21:13:59