保持CUDA kenel的寄存器/線程數低有什麼好處嗎?每個線程的寄存器數量
我在想沒有優勢(速度或其他)。上下文切換對於3個reg /線程來說是快速的,因爲它是48個regs /線程。除非你不想使用所有可用的寄存器,否則沒有意義。內核之間不共享寄存器。 這是錯誤的嗎?
編輯: 從CUDA4.2節目指南(5.2.3):
The number of registers used by a kernel can have a significant impact on the number
of resident warps. For example, for devices of compute capability 1.2, if a kernel uses 16
registers and each block has 512 threads and requires very little shared memory, then two
blocks (i.e. 32 warps) can reside on the multiprocessor since they require 2x512x16
registers, which exactly matches the number of registers available on the multiprocessor.
But as soon as the kernel uses one more register, only one block (i.e. 16 warps) can be
resident since two blocks would require 2x512x17 registers, which are more registers than
are available on the multiprocessor. Therefore, the compiler attempts to minimize register
usage while keeping register spilling (see Section 5.3.2.2) and the number of instructions
to a minimum.
的 「的REG /線程」 計數不出現不亞於總章數無關緊要。