關於錯誤代碼「無效的設備功能」的NVCC與compute_和SM_編譯選項

我希望你能幫助我找出所需的正確的編譯器選項下面的卡：關於錯誤代碼「無效的設備功能」的NVCC與compute_和SM_編譯選項

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 780 Ti" CUDA Driver Version/Runtime Version 7.0/6.5 CUDA Capability Major/Minor version number: 3.5 Total amount of global memory: 3072 MBytes (3220897792 bytes) (15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores GPU Clock rate: 1020 MHz (1.02 GHz) Memory Clock rate: 3500 Mhz
Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32
Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size
(x,y,z): (2147483647, 65535, 65535) Maximum memory pitch:
2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support:
Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID/PCI location ID: 3/0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce GTX 780 Ti Result = PASS

我有一塊cuda代碼和nvcc（CUDA 6.5）調試。當我將這些選項：

-arch compute_20 -code sm_20

則計劃給了我這個錯誤：

error code invalid device function

如果我刪除這些選項（NVCC源-o exe文件），該程序運行正常。任何人都可以通過查看./deviceQuery的輸出來找出哪個compute_和sm_適合我的卡？我從nvidia手冊中讀到，使用正確的compute_和sm_選項可以顯着提高卡的速度。有沒有人在數量上觀察到這種加速？

感謝

來源

2015-10-09 Khoa

Can anyone help me figure out which compute_ and sm_ is suitable for my card by looking at the output of ./deviceQuery?

追隨你的例子，對於GTX 780 TI提供的正確的設置是：

-arch compute_35 -code sm_35

以上即會生成一個cc3.5設備（只）上運行代碼。我認爲這只是爲了更好地說明：

-arch=sm_35

這是稍微複雜版本的縮寫：

-gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35

這將生成將在一個cc3.5 或更新設備運行的代碼。 3.5/35號源於這條線在你DEVICEQUERY輸出：

Capability Major/Minor version number: 3.5

如果你想更好地瞭解開關選項/不同，我建議您查看nvcc manual和this question/answer。

來源

2015-10-09 21:35:40

謝謝羅伯特。只是一個小問題：「CUDA Capability Major/Minor版本號：3.5」=>這是否意味着此卡的compute_35和sm_35的_35？ – Khoa

是的，3.5告訴我們我們想用'compute_35'和'sm_35'來定位這張卡片。 –

關於錯誤代碼「無效的設備功能」的NVCC與compute_和SM_編譯選項

回答

相關問題