從另一個OpenCL內核調用OpenCL內核

我在一篇文章中看到我們可以從OpenCL內核調用函數。但是在我的情況下，我需要並行化複雜函數（由所有可用線程運行），所以我必須將該函數也作爲內核，並像主函數一樣直接調用它。或對這種情況有什麼可能的解決方案？在此先感謝從另一個OpenCL內核調用OpenCL內核

來源

2011-10-12 Akhtar Ali

您可以從內核中調用helper函數，並且它們將以與內核相同的方式進行並行化，想象它們在內核代碼中內聯。因此，每個工作項都將調用它所處理的工作集的幫助函數。

float4 helper_function(float4 input) 
{ 
    return input.x + input.y + input.z + input.w; 
} 
__kernel kernel_function(const float4* arr, float4* out) 
{ 
    id = get_global_id(0); 
    out[id] = helper_function(arr[id]); 
}

來源

2011-10-13 20:08:22 sramij

添加到sramij答案，從調用內核本身另一個內核被稱爲動態並行。爲此你需要支持OpenCL 2.0的設備。可以參考http://stackoverflow.com/questions/12913640/opencl-dynamic-parallelism-gpu-spawned-threads – Meluha

如果我正確理解你的問題，你想單獨完整地遍歷內核中的緩衝區。我認爲在內核中這是不可能的，所以你必須將「內部」通道的代碼作爲一個單獨的內核創建，並且還要與你的主機代碼分開調用該內核。該內核的輸出不必讀回主機內存，但可以保留在內核調用之間的設備內存中。

來源

2011-11-10 09:16:22

OpenCL 2.0 spec爲動態平等主義增加了一項新功能。

6.13.17 Enqueuing Kernels 
OpenCL 2.0 allows a kernel to independently enqueue to the same device, without host 
interaction. ...

在設備上下方my_func_B enqueus my_func_A的例子：

kernel void 
my_func_A(global int *a, global int *b, global int *c) 
{ 
... 
} 

kernel void 
my_func_B(global int *a, global int *b, global int *c) 
{ 
ndrange_t ndrange; 
// build ndrange information 
... 
// example – enqueue a kernel as a block 
enqueue_kernel(get_default_queue(), ndrange, ^{my_func_A(a, b, c);}); 
... 
}

來源

2014-07-30 13:24:28

從另一個OpenCL內核調用OpenCL內核

回答

相關問題