我正在開發一個在循環中多次調用同一個內核的OpenCL程序。當我使用clEnqueueReadBuffer將設備內存傳回主機時,它會報告命令隊列無效。OpenCL:在多次循環調用內核之後從設備獲取INVALID_COMMAND_QUEUE到主機內存傳輸
下面是一個函數,它被稱爲啓動一個雙聲道排序,它被縮短,使其更具可讀性。設備列表,上下文,命令隊列和內核在外部創建並傳遞給此函數。 列表包含要排序的列表,大小是列表中的元素數。
cl_int OpenCLBitonicSort(cl_device_id device, cl_context context,
cl_command_queue commandQueue, cl_kernel bitonicSortKernel,
unsigned int * list, unsigned int size){
//create OpenCL specific variables
cl_int error = CL_SUCCESS;
size_t maximum_local_ws;
size_t local_ws;
size_t global_ws;
//create variables that keep track of bitonic sorting progress
unsigned int stage = 0;
unsigned int subStage;
unsigned int numberOfStages = 0;
//get maximum work group size
clGetKernelWorkGroupInfo(bitonicSortKernel, device,
CL_KERNEL_WORK_GROUP_SIZE, sizeof(maximum_local_ws),
&maximum_local_ws, NULL);
//make local_ws the largest perfect square allowed by OpenCL
for(i = 1; i <= maximum_local_ws; i *= 2){
local_ws = (size_t) i;
}
//total number of comparators will be half the items in the list
global_ws = (size_t) size/2;
//transfer list to the device
cl_mem list_d = clCreateBuffer(context, CL_MEM_COPY_HOST_PTR,
size * sizeof(unsigned int), list, &error);
//find the number of stages needed (numberOfStages = ln(size))
for(numberOfStages = 0; (1 << numberOfStages^size); numberOfStages++){
}
//loop through all stages
for(stage = 0; stage < numberOfStages; stage++){
//loop through all substages in each stage
for(subStage = stage, i = 0; i <= stage; subStage--, i++){
//add kernel parameters
error = clSetKernelArg(bitonicSortKernel, 0,
sizeof(cl_mem), &list_d);
error = clSetKernelArg(bitonicSortKernel, 1,
sizeof(unsigned int), &size);
error = clSetKernelArg(bitonicSortKernel, 2,
sizeof(unsigned int), &stage);
error = clSetKernelArg(bitonicSortKernel, 3,
sizeof(unsigned int), &subStage);
//call the kernel
error = clEnqueueNDRangeKernel(commandQueue, bitonicSortKernel, 1,
NULL, &global_ws, &local_ws, 0, NULL, NULL);
//wait for the kernel to stop executing
error = clEnqueueBarrier(commandQueue);
}
}
//read the result back to the host
error = clEnqueueReadBuffer(commandQueue, list_d, CL_TRUE, 0,
size * sizeof(unsigned int), list, 0, NULL, NULL);
//free the list on the device
clReleaseMemObject(list_d);
return error;
}
在這段代碼中:clEnqueueReadBuffer表示commandQueue無效。然而,當我調用clEnqueueNDRangeKernel和clEnqueueBarrier時它是有效的。
當我設置 numberOfStages僅僅是1和階段僅僅是0,這樣clEnqueueNDRangeKernel只調用一次,代碼工作沒有返回錯誤(雖然結果是不正確的)。多次調用clEnqueueNDRangeKernel有一個問題(我真的需要這樣做)。
我在Mac OS 10.6 Snow Leopard上,我正在使用Apple的OpenCL 1.0平臺和NVidia GeForce 9600m。在其他平臺上的OpenCL中是否可以在循環內運行內核?有沒有人在OS X上有OpenCL這樣的問題?什麼可能導致命令隊列無效?