他們一定是去了一個更加動態的方法稱爲sub-group
:https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf
Sub-group: Sub-groups are an implementation-dependent grouping of work-items within a
work-group. The size and number of sub-groups is implementation-defined.
和
Work-groups are further divided into sub-groups,
which provide an additional level of control over execution.
和
The mapping of work-items to
sub-groups is implementation-defined and may be queried at runtime.
因此,即使它不是所謂的波陣面,它現在是queryab le在運行時和
在沒有同步功能(例如,一個障礙), 子組內的工作項可能被序列化。在存在 子組函數的情況下,子組內的工作項可以在任何給定的子組函數之前,動態地遇到 對子組函數之間以及在工作組函數和 之間被串行化 內核的結束。
甚至鎖步的方式有時可能會丟失。
在這些之上,
sub_group_all() and
sub_group_broadcast() and are described in OpenCL C++ kernel language and IL specifications.
The use of these sub-group functions implies sequenced-before relationships between statements
within the execution of a single work-item in order to satisfy data dependencies.
說某種內部子組通信的存在。因爲現在的OpenCL有子內核的定義:(?升級)
Device-side enqueue: A mechanism whereby a kernel-instance is enqueued by a kernel-instance
running on a device without direct involvement by the host program. This produces nested
parallelism; i.e. additional levels of concurrency are nested inside a running kernel-instance.
The kernel-instance executing on a device (the parent kernel) enqueues a kernel-instance (the
child kernel) to a device-side command queue. Child and parent kernels execute asynchronously
though a parent kernel does not complete until all of its child-kernels have completed.
最終,喜歡的東西
kernel void launcher()
{
ndrange_t ndrange = ndrange_1D(1);
enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange,
^{
size_t id = get_global_id(0);
}
);
}
你應該能夠與任何你需要的大小產卵自己的波陣面和它們可以與父內核同時工作(並且可以與子內部的線程進行通信),但是它們不會被稱爲波前,因爲它們沒有被硬件imho硬編碼。
2。0 API規格說:
Extreme care should be exercised when writing code that uses
subgroups if the goal is to write portable OpenCL applications.
想起AMD的16寬simds和NVIDIA公司的32寬simds與一些假想FPGA的95-寬計算核心。僞波前可能?
謝謝!但是這意味着什麼,以及會出現什麼樣的問題:「如果目標是編寫可移植的OpenCL應用程序,編寫使用子組的代碼時應該非常小心」。正如那裏所說的那樣:**子組是邏輯抽象的硬件SIMD執行模型**類似於wavefronts':page-100:http://amd-dev.wpengine.netdna-cdn.com/wordpress/media /2013/12/AMD_OpenCL_Programming_User_Guide2.pdf – Alex
如果子組是硬件SIMD的邏輯抽象,並且如果在運行時我可以通過使用get_sub_group_size()來獲得當前設備的子組(SIMD)的寬度, /'get_max_sub_group_size()',哪裏可能出現porblem? page-160:http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_OpenCL_Programming_User_Guide2.pdf – Alex
get_max_sub_group_size:它表示「這個值對於給定的一組調度維度是不變的,並且一個爲給定設備編譯的內核對象「,所以它可以工作 –