2015-10-09 40 views
0

我希望你能幫助我找出所需的正確的編譯器選項下面的卡:關於錯誤代碼「無效的設備功能」的NVCC與compute_和SM_編譯選項

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 780 Ti" CUDA Driver Version/Runtime Version 7.0/6.5 CUDA Capability Major/Minor version number: 3.5 Total amount of global memory: 3072 MBytes (3220897792 bytes) (15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores GPU Clock rate: 1020 MHz (1.02 GHz) Memory Clock rate: 3500 Mhz
Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32
Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size
(x,y,z): (2147483647, 65535, 65535) Maximum memory pitch:
2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support:
Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID/PCI location ID: 3/0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce GTX 780 Ti Result = PASS

我有一塊cuda代碼和nvcc(CUDA 6.5)調試。當我將這些選項:

-arch compute_20 -code sm_20

則計劃給了我這個錯誤:

error code invalid device function

如果我刪除這些選項(NVCC源-o exe文件),該程序運行正常。 任何人都可以通過查看./deviceQuery的輸出來找出哪個compute_和sm_適合我的卡? 我從nvidia手冊中讀到,使用正確的compute_和sm_選項可以顯着提高卡的速度。有沒有人在數量上觀察到這種加速?

感謝

回答

1

Can anyone help me figure out which compute_ and sm_ is suitable for my card by looking at the output of ./deviceQuery?

追隨你的例子,對於GTX 780 TI提供的正確的設置是:

-arch compute_35 -code sm_35 

以上即會生成一個cc3.5設備(只)上運行代碼。我認爲這只是爲了更好地說明:

-arch=sm_35 

這是稍微複雜版本的縮寫:

-gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 

這將生成將在一個cc3.5 或更新設備運行的代碼。 3.5/35號源於這條線在你DEVICEQUERY輸出:

Capability Major/Minor version number: 3.5 

如果你想更好地瞭解開關選項/不同,我建議您查看nvcc manualthis question/answer

+0

謝謝羅伯特。只是一個小問題:「CUDA Capability Major/Minor版本號:3.5」=>這是否意味着此卡的compute_35和sm_35的_35? – Khoa

+1

是的,3.5告訴我們我們想用'compute_35'和'sm_35'來定位這張卡片。 –