2010-02-17 127 views
12

我們有一些夜間製造機器,它們安裝了cuda libraries,但沒有安裝支持cuda的GPU。這些機器能夠構建啓用cuda的程序,但它們無法運行這些程序。從cmake測試存在cuda GPU的最簡單方法是什麼?

在我們的夜間自動生成過程中,我們的CMake的腳本中使用的cmake命令

find_package(CUDA)

,以確定是否已安裝CUDA軟件。這會在安裝了cuda軟件的平臺上設置cmake變量CUDA_FOUND。這是偉大的,它完美的作品。當設置了CUDA_FOUND時,可以構建啓用cuda的程序。即使機器沒有支持cuda的GPU。

但是,使用cuda的測試程序在非GPU cuda機器上自然失敗,導致我們的夜間儀表板看起來「髒」。所以我希望cmake避免在這些機器上運行這些測試。但我仍然希望在這些機器上構建cuda軟件。

得到肯定的CUDA_FOUND結果之後,我想測試一個實際的GPU的存在,然後設置一個變量,說CUDA_GPU_FOUND,以反映這一點。

什麼是最簡單的方法讓cmake測試存在的cuda功能的GPU?

這需要在三個平臺上工作:Windows與MSVC,Mac和Linux。 (這就是爲什麼我們首先使用cmake)

編輯:關於如何編寫一個程序來測試GPU的存在,有幾個好看的建議。仍然缺少的是讓CMake在配置時編譯和運行該程序的方法。我懷疑CMake中的TRY_RUN命令在這裏很重要,但不幸的是,命令是nearly undocumented,我無法弄清楚如何使它工作。這個CMake問題的一部分可能是一個更難的問題。也許我應該問這是兩個單獨的問題...

回答

17

回答這個問題由兩個部分組成:

  1. 一種程序,以檢測CUDA的GPU的存在。
  2. CMake代碼在配置時編譯,運行和解釋該程序的結果。

對於第1部分,gpu嗅探程序,我從fabrizioM提供的答案開始,因爲它非常緊湊。我很快發現,我需要很多未知的答案中找到的細節才能讓它運作良好。

#include <stdio.h> 
#include <cuda_runtime.h> 

int main() { 
    int deviceCount, device; 
    int gpuDeviceCount = 0; 
    struct cudaDeviceProp properties; 
    cudaError_t cudaResultCode = cudaGetDeviceCount(&deviceCount); 
    if (cudaResultCode != cudaSuccess) 
     deviceCount = 0; 
    /* machines with no GPUs can still report one emulation device */ 
    for (device = 0; device < deviceCount; ++device) { 
     cudaGetDeviceProperties(&properties, device); 
     if (properties.major != 9999) /* 9999 means emulation only */ 
      ++gpuDeviceCount; 
    } 
    printf("%d GPU CUDA device(s) found\n", gpuDeviceCount); 

    /* don't just return the number of gpus, because other runtime cuda 
     errors can also yield non-zero return values */ 
    if (gpuDeviceCount > 0) 
     return 0; /* success */ 
    else 
     return 1; /* failure */ 
} 

注意,返回的代碼是在支持CUDA的GPU找到了零的情況下:我結束了與下面的C源文件,我命名爲has_cuda_gpu.c是。這是因爲在我的一臺有 - 無GPU的機器上,該程序會產生一個帶有非零退出代碼的運行時錯誤。因此,任何非零退出代碼都被解釋爲「cuda無法在此機器上工作」。

您可能會問爲什麼我不在非GPU機器上使用cuda仿真模式。這是因爲仿真模式是越野車。我只想調試我的代碼,並解決cuda GPU代碼中的錯誤。我沒有時間去調試模擬器。

問題的第二部分是使用此測試程序的cmake代碼。經過一番鬥爭,我發現了。以下塊是一個更大的CMakeLists.txt文件的一部分:

find_package(CUDA) 
if(CUDA_FOUND) 
    try_run(RUN_RESULT_VAR COMPILE_RESULT_VAR 
     ${CMAKE_BINARY_DIR} 
     ${CMAKE_CURRENT_SOURCE_DIR}/has_cuda_gpu.c 
     CMAKE_FLAGS 
      -DINCLUDE_DIRECTORIES:STRING=${CUDA_TOOLKIT_INCLUDE} 
      -DLINK_LIBRARIES:STRING=${CUDA_CUDART_LIBRARY} 
     COMPILE_OUTPUT_VARIABLE COMPILE_OUTPUT_VAR 
     RUN_OUTPUT_VARIABLE RUN_OUTPUT_VAR) 
    message("${RUN_OUTPUT_VAR}") # Display number of GPUs found 
    # COMPILE_RESULT_VAR is TRUE when compile succeeds 
    # RUN_RESULT_VAR is zero when a GPU is found 
    if(COMPILE_RESULT_VAR AND NOT RUN_RESULT_VAR) 
     set(CUDA_HAVE_GPU TRUE CACHE BOOL "Whether CUDA-capable GPU is present") 
    else() 
     set(CUDA_HAVE_GPU FALSE CACHE BOOL "Whether CUDA-capable GPU is present") 
    endif() 
endif(CUDA_FOUND) 

這將設置一個CUDA_HAVE_GPU布爾變量在隨後可以被用來觸發條件操作cmake的。

我花了很長時間才發現包含和鏈接參數需要在CMAKE_FLAGS節中介紹,以及語法應該是什麼。 try_run documentation非常輕,但try_compile documentation中有更多信息,這是一個密切相關的命令。在開始工作之前,我仍然需要在網上搜索try_compile和try_run的例子。

另一個棘手但很重要的細節是try_run,「bindir」的第三個參數。您應該始終將其設置爲${CMAKE_BINARY_DIR}。特別是,如果您位於項目的子目錄中,請不要將其設置爲${CMAKE_CURRENT_BINARY_DIR}。 CMake希望在bindir中找到子目錄CMakeFiles/CMakeTmp,並且如果該目錄不存在則發出錯誤。只需使用${CMAKE_BINARY_DIR},這是這些子目錄似乎自然存在的位置。

+0

可以避免使用CMake來運行與CUDA運行時一起安裝的工具,如nvidia-smi,從而避免維護和編譯單獨的程序。看到我的答案。 – mabraham 2017-01-10 16:27:05

3

如果找到cuda,則可以編譯小型GPU查詢程序。這裏是一個簡單的,你可以採取的需求:

#include <stdlib.h> 
#include <stdio.h> 
#include <cuda.h> 
#include <cuda_runtime.h> 

int main(int argc, char** argv) { 
    int ct,dev; 
    cudaError_t code; 
    struct cudaDeviceProp prop; 

cudaGetDeviceCount(&ct); 
code = cudaGetLastError(); 
if(code) printf("%s\n", cudaGetErrorString(code)); 


if(ct == 0) { 
    printf("Cuda device not found.\n"); 
    exit(0); 
} 
printf("Found %i Cuda device(s).\n",ct); 

for (dev = 0; dev < ct; ++dev) { 
printf("Cuda device %i\n", dev); 

cudaGetDeviceProperties(&prop,dev); 
printf("\tname : %s\n", prop.name); 
printf("\ttotalGlobablMem: %lu\n", (unsigned long)prop.totalGlobalMem); 
printf("\tsharedMemPerBlock: %i\n", prop.sharedMemPerBlock); 
printf("\tregsPerBlock: %i\n", prop.regsPerBlock); 
printf("\twarpSize: %i\n", prop.warpSize); 
printf("\tmemPitch: %i\n", prop.memPitch); 
printf("\tmaxThreadsPerBlock: %i\n", prop.maxThreadsPerBlock); 
printf("\tmaxThreadsDim: %i, %i, %i\n", prop.maxThreadsDim[0], prop.maxThreadsDim[1], prop.maxThreadsDim[2]); 
printf("\tmaxGridSize: %i, %i, %i\n", prop.maxGridSize[0], prop.maxGridSize[1], prop.maxGridSize[2]); 
printf("\tclockRate: %i\n", prop.clockRate); 
printf("\ttotalConstMem: %i\n", prop.totalConstMem); 
printf("\tmajor: %i\n", prop.major); 
printf("\tminor: %i\n", prop.minor); 
printf("\ttextureAlignment: %i\n", prop.textureAlignment); 
printf("\tdeviceOverlap: %i\n", prop.deviceOverlap); 
printf("\tmultiProcessorCount: %i\n", prop.multiProcessorCount); 
} 
} 
+0

+1這對於嗅探GPU的部分來說是一個很好的開始。但如果沒有cmake部分,我很猶豫是否接受這個答案。 – 2010-02-19 02:07:41

+0

@Christopher 沒問題,可惜我不知道cmake(我用automake)。 http://www.gnu.org/software/hello/manual/autoconf/Runtime.html是autoconf的相關部分。也許它會幫助你找到相應的cmake功能 – Anycorn 2010-02-19 02:59:46

7

寫一個簡單的程序像

#include<cuda.h> 

int main(){ 
    int deviceCount; 
    cudaError_t e = cudaGetDeviceCount(&deviceCount); 
    return e == cudaSuccess ? deviceCount : -1; 
} 

,並檢查返回值。

+0

+1這個答案和未知的答案一起給了我一個很好的開始解決這個問題。 – 2010-02-19 16:31:58

4

我剛剛寫了一個純Python腳本,它完成了您似乎需要的一些事情(我從pystream項目中獲取了大部分內容)。它基本上只是CUDA運行時庫(它使用ctypes)中的一些函數的包裝。查看main()函數以查看示例用法。另外,請注意,我只是寫了它,所以它可能包含錯誤。謹慎使用。

#!/bin/bash 

import sys 
import platform 
import ctypes 

""" 
cudart.py: used to access pars of the CUDA runtime library. 
Most of this code was lifted from the pystream project (it's BSD licensed): 
http://code.google.com/p/pystream 

Note that this is likely to only work with CUDA 2.3 
To extend to other versions, you may need to edit the DeviceProp Class 
""" 

cudaSuccess = 0 
errorDict = { 
    1: 'MissingConfigurationError', 
    2: 'MemoryAllocationError', 
    3: 'InitializationError', 
    4: 'LaunchFailureError', 
    5: 'PriorLaunchFailureError', 
    6: 'LaunchTimeoutError', 
    7: 'LaunchOutOfResourcesError', 
    8: 'InvalidDeviceFunctionError', 
    9: 'InvalidConfigurationError', 
    10: 'InvalidDeviceError', 
    11: 'InvalidValueError', 
    12: 'InvalidPitchValueError', 
    13: 'InvalidSymbolError', 
    14: 'MapBufferObjectFailedError', 
    15: 'UnmapBufferObjectFailedError', 
    16: 'InvalidHostPointerError', 
    17: 'InvalidDevicePointerError', 
    18: 'InvalidTextureError', 
    19: 'InvalidTextureBindingError', 
    20: 'InvalidChannelDescriptorError', 
    21: 'InvalidMemcpyDirectionError', 
    22: 'AddressOfConstantError', 
    23: 'TextureFetchFailedError', 
    24: 'TextureNotBoundError', 
    25: 'SynchronizationError', 
    26: 'InvalidFilterSettingError', 
    27: 'InvalidNormSettingError', 
    28: 'MixedDeviceExecutionError', 
    29: 'CudartUnloadingError', 
    30: 'UnknownError', 
    31: 'NotYetImplementedError', 
    32: 'MemoryValueTooLargeError', 
    33: 'InvalidResourceHandleError', 
    34: 'NotReadyError', 
    0x7f: 'StartupFailureError', 
    10000: 'ApiFailureBaseError'} 


try: 
    if platform.system() == "Microsoft": 
     _libcudart = ctypes.windll.LoadLibrary('cudart.dll') 
    elif platform.system()=="Darwin": 
     _libcudart = ctypes.cdll.LoadLibrary('libcudart.dylib') 
    else: 
     _libcudart = ctypes.cdll.LoadLibrary('libcudart.so') 
    _libcudart_error = None 
except OSError, e: 
    _libcudart_error = e 
    _libcudart = None 

def _checkCudaStatus(status): 
    if status != cudaSuccess: 
     eClassString = errorDict[status] 
     # Get the class by name from the top level of this module 
     eClass = globals()[eClassString] 
     raise eClass() 

def _checkDeviceNumber(device): 
    assert isinstance(device, int), "device number must be an int" 
    assert device >= 0, "device number must be greater than 0" 
    assert device < 2**8-1, "device number must be < 255" 


# cudaDeviceProp 
class DeviceProp(ctypes.Structure): 
    _fields_ = [ 
     ("name", 256*ctypes.c_char), # < ASCII string identifying device 
     ("totalGlobalMem", ctypes.c_size_t), # < Global memory available on device in bytes 
     ("sharedMemPerBlock", ctypes.c_size_t), # < Shared memory available per block in bytes 
     ("regsPerBlock", ctypes.c_int), # < 32-bit registers available per block 
     ("warpSize", ctypes.c_int), # < Warp size in threads 
     ("memPitch", ctypes.c_size_t), # < Maximum pitch in bytes allowed by memory copies 
     ("maxThreadsPerBlock", ctypes.c_int), # < Maximum number of threads per block 
     ("maxThreadsDim", 3*ctypes.c_int), # < Maximum size of each dimension of a block 
     ("maxGridSize", 3*ctypes.c_int), # < Maximum size of each dimension of a grid 
     ("clockRate", ctypes.c_int), # < Clock frequency in kilohertz 
     ("totalConstMem", ctypes.c_size_t), # < Constant memory available on device in bytes 
     ("major", ctypes.c_int), # < Major compute capability 
     ("minor", ctypes.c_int), # < Minor compute capability 
     ("textureAlignment", ctypes.c_size_t), # < Alignment requirement for textures 
     ("deviceOverlap", ctypes.c_int), # < Device can concurrently copy memory and execute a kernel 
     ("multiProcessorCount", ctypes.c_int), # < Number of multiprocessors on device 
     ("kernelExecTimeoutEnabled", ctypes.c_int), # < Specified whether there is a run time limit on kernels 
     ("integrated", ctypes.c_int), # < Device is integrated as opposed to discrete 
     ("canMapHostMemory", ctypes.c_int), # < Device can map host memory with cudaHostAlloc/cudaHostGetDevicePointer 
     ("computeMode", ctypes.c_int), # < Compute mode (See ::cudaComputeMode) 
     ("__cudaReserved", 36*ctypes.c_int), 
] 

    def __str__(self): 
     return """NVidia GPU Specifications: 
    Name: %s 
    Total global mem: %i 
    Shared mem per block: %i 
    Registers per block: %i 
    Warp size: %i 
    Mem pitch: %i 
    Max threads per block: %i 
    Max treads dim: (%i, %i, %i) 
    Max grid size: (%i, %i, %i) 
    Total const mem: %i 
    Compute capability: %i.%i 
    Clock Rate (GHz): %f 
    Texture alignment: %i 
""" % (self.name, self.totalGlobalMem, self.sharedMemPerBlock, 
     self.regsPerBlock, self.warpSize, self.memPitch, 
     self.maxThreadsPerBlock, 
     self.maxThreadsDim[0], self.maxThreadsDim[1], self.maxThreadsDim[2], 
     self.maxGridSize[0], self.maxGridSize[1], self.maxGridSize[2], 
     self.totalConstMem, self.major, self.minor, 
     float(self.clockRate)/1.0e6, self.textureAlignment) 

def cudaGetDeviceCount(): 
    if _libcudart is None: return 0 
    deviceCount = ctypes.c_int() 
    status = _libcudart.cudaGetDeviceCount(ctypes.byref(deviceCount)) 
    _checkCudaStatus(status) 
    return deviceCount.value 

def getDeviceProperties(device): 
    if _libcudart is None: return None 
    _checkDeviceNumber(device) 
    props = DeviceProp() 
    status = _libcudart.cudaGetDeviceProperties(ctypes.byref(props), device) 
    _checkCudaStatus(status) 
    return props 

def getDriverVersion(): 
    if _libcudart is None: return None 
    version = ctypes.c_int() 
    _libcudart.cudaDriverGetVersion(ctypes.byref(version)) 
    v = "%d.%d" % (version.value//1000, 
        version.value%100) 
    return v 

def getRuntimeVersion(): 
    if _libcudart is None: return None 
    version = ctypes.c_int() 
    _libcudart.cudaRuntimeGetVersion(ctypes.byref(version)) 
    v = "%d.%d" % (version.value//1000, 
        version.value%100) 
    return v 

def getGpuCount(): 
    count=0 
    for ii in range(cudaGetDeviceCount()): 
     props = getDeviceProperties(ii) 
     if props.major!=9999: count+=1 
    return count 

def getLoadError(): 
    return _libcudart_error 


version = getDriverVersion() 
if version is not None and not version.startswith('2.3'): 
    sys.stdout.write("WARNING: Driver version %s may not work with %s\n" % 
        (version, sys.argv[0])) 

version = getRuntimeVersion() 
if version is not None and not version.startswith('2.3'): 
    sys.stdout.write("WARNING: Runtime version %s may not work with %s\n" % 
        (version, sys.argv[0])) 


def main(): 

    sys.stdout.write("Driver version: %s\n" % getDriverVersion()) 
    sys.stdout.write("Runtime version: %s\n" % getRuntimeVersion()) 

    nn = cudaGetDeviceCount() 
    sys.stdout.write("Device count: %s\n" % nn) 

    for ii in range(nn): 
     props = getDeviceProperties(ii) 
     sys.stdout.write("\nDevice %d:\n" % ii) 
     #sys.stdout.write("%s" % props) 
     for f_name, f_type in props._fields_: 
      attr = props.__getattribute__(f_name) 
      sys.stdout.write(" %s: %s\n" % (f_name, attr)) 

    gpuCount = getGpuCount() 
    if gpuCount > 0: 
     sys.stdout.write("\n") 
    sys.stdout.write("GPU count: %d\n" % getGpuCount()) 
    e = getLoadError() 
    if e is not None: 
     sys.stdout.write("There was an error loading a library:\n%s\n\n" % e) 

if __name__=="__main__": 
    main() 
+0

這是使用python的一個有趣的想法。這樣cmake部分可能會包含FIND_PACKAGE(PythonInterp)和EXECUTE_PROCESS(...),這看起來可能更簡單。另一方面,我擔心該python腳本很長,看起來可能取決於可能會改變的CUDA API的各個方面。 – 2010-02-21 17:15:54

+0

同意。 DeviceProp類可能需要更新每個新的CUDA運行時版本。 – 2010-02-22 02:12:21

+0

我得到一個錯誤:除了OSError,e:[SyntaxError:invalid syntax]在python 3.5中 – programmer 2017-06-09 13:24:49

1

一個有用的方法是運行CUDA安裝的程序,例如nvidia-smi,以查看它們返回的內容。

 find_program(_nvidia_smi "nvidia-smi") 
     if (_nvidia_smi) 
      set(DETECT_GPU_COUNT_NVIDIA_SMI 0) 
      # execute nvidia-smi -L to get a short list of GPUs available 
      exec_program(${_nvidia_smi_path} ARGS -L 
       OUTPUT_VARIABLE _nvidia_smi_out 
       RETURN_VALUE _nvidia_smi_ret) 
      # process the stdout of nvidia-smi 
      if (_nvidia_smi_ret EQUAL 0) 
       # convert string with newlines to list of strings 
       string(REGEX REPLACE "\n" ";" _nvidia_smi_out "${_nvidia_smi_out}") 
       foreach(_line ${_nvidia_smi_out}) 
        if (_line MATCHES "^GPU [0-9]+:") 
         math(EXPR DETECT_GPU_COUNT_NVIDIA_SMI "${DETECT_GPU_COUNT_NVIDIA_SMI}+1") 
         # the UUID is not very useful for the user, remove it 
         string(REGEX REPLACE " \\(UUID:.*\\)" "" _gpu_info "${_line}") 
         if (NOT _gpu_info STREQUAL "") 
          list(APPEND DETECT_GPU_INFO "${_gpu_info}") 
         endif() 
        endif() 
       endforeach() 

       check_num_gpu_info(${DETECT_GPU_COUNT_NVIDIA_SMI} DETECT_GPU_INFO) 
       set(DETECT_GPU_COUNT ${DETECT_GPU_COUNT_NVIDIA_SMI}) 
      endif() 
     endif() 

也可以查詢linux/proc或lspci。請參閱完整工作的CMake示例,其中https://github.com/gromacs/gromacs/blob/master/cmake/gmxDetectGpu.cmake

相關問題