我希望你能幫助我找出所需的正確的編譯器選項下面的卡:關於錯誤代碼「無效的設備功能」的NVCC與compute_和SM_編譯選項
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 780 Ti" CUDA Driver Version/Runtime Version 7.0/6.5 CUDA Capability Major/Minor version number: 3.5 Total amount of global memory: 3072 MBytes (3220897792 bytes) (15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores GPU Clock rate: 1020 MHz (1.02 GHz) Memory Clock rate: 3500 Mhz
Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32
Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size
(x,y,z): (2147483647, 65535, 65535) Maximum memory pitch:
2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support:
Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID/PCI location ID: 3/0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce GTX 780 Ti Result = PASS
我有一塊cuda代碼和nvcc(CUDA 6.5)調試。當我將這些選項:
-arch compute_20 -code sm_20
則計劃給了我這個錯誤:
error code invalid device function
如果我刪除這些選項(NVCC源-o exe文件),該程序運行正常。 任何人都可以通過查看./deviceQuery的輸出來找出哪個compute_和sm_適合我的卡? 我從nvidia手冊中讀到,使用正確的compute_和sm_選項可以顯着提高卡的速度。有沒有人在數量上觀察到這種加速?
感謝
謝謝羅伯特。只是一個小問題:「CUDA Capability Major/Minor版本號:3.5」=>這是否意味着此卡的compute_35和sm_35的_35? – Khoa
是的,3.5告訴我們我們想用'compute_35'和'sm_35'來定位這張卡片。 –