每個線程的寄存器數量

保持CUDA kenel的寄存器/線程數低有什麼好處嗎？每個線程的寄存器數量

我在想沒有優勢（速度或其他）。上下文切換對於3個reg /線程來說是快速的，因爲它是48個regs /線程。除非你不想使用所有可用的寄存器，否則沒有意義。內核之間不共享寄存器。這是錯誤的嗎？

編輯： 從CUDA4.2節目指南（5.2.3）：

The number of registers used by a kernel can have a significant impact on the number 
    of resident warps. For example, for devices of compute capability 1.2, if a kernel uses 16 
registers and each block has 512 threads and requires very little shared memory, then two 
    blocks (i.e. 32 warps) can reside on the multiprocessor since they require 2x512x16 
    registers, which exactly matches the number of registers available on the multiprocessor. 
    But as soon as the kernel uses one more register, only one block (i.e. 16 warps) can be 
    resident since two blocks would require 2x512x17 registers, which are more registers than 
    are available on the multiprocessor. Therefore, the compiler attempts to minimize register 
    usage while keeping register spilling (see Section 5.3.2.2) and the number of instructions 
    to a minimum.

的「的REG /線程」計數不出現不亞於總章數無關緊要。

來源

2013-06-27 Doug