CUDA atomicAdd（）長長整型

任何時候，我嘗試使用atomicAdd比(*int, int)任何其他我得到這個錯誤：CUDA atomicAdd（）長長整型

error: no instance of overloaded function "atomicAdd" matches the argument list

但我需要使用較大的數據類型比int。有沒有解決方法？

設備查詢：

/usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery Starting... 

CUDA Device Query (Runtime API) version (CUDART static linking) 

Detected 1 CUDA Capable device(s) 

Device 0: "GeForce GTX 680" 
    CUDA Driver Version/Runtime Version   5.0/5.0 
    CUDA Capability Major/Minor version number: 3.0 
    Total amount of global memory:     4095 MBytes (4294246400 bytes) 
    (8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores 
    GPU Clock rate:        1084 MHz (1.08 GHz) 
    Memory Clock rate:        3004 Mhz 
    Memory Bus Width:        256-bit 
    L2 Cache Size:         524288 bytes 
    Max Texture Dimension Size (x,y,z)    1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096) 
    Max Layered Texture Size (dim) x layers  1D=(16384) x 2048, 2D=(16384,16384) x 2048 
    Total amount of constant memory:    65536 bytes 
    Total amount of shared memory per block:  49152 bytes 
    Total number of registers available per block: 65536 
    Warp size:          32 
    Maximum number of threads per multiprocessor: 2048 
    Maximum number of threads per block:   1024 
    Maximum sizes of each dimension of a block: 1024 x 1024 x 64 
    Maximum sizes of each dimension of a grid:  2147483647 x 65535 x 65535 
    Maximum memory pitch:       2147483647 bytes 
    Texture alignment:        512 bytes 
    Concurrent copy and kernel execution:   Yes with 1 copy engine(s) 
    Run time limit on kernels:      Yes 
    Integrated GPU sharing Host Memory:   No 
    Support host page-locked memory mapping:  Yes 
    Alignment requirement for Surfaces:   Yes 
    Device has ECC support:      Disabled 
    Device supports Unified Addressing (UVA):  Yes 
    Device PCI Bus ID/PCI location ID:   1/0 
    Compute Mode: 
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GTX 680

來源

2013-06-25 user1743798

你可以用'無符號長長int'工作，或者它必須是'long long int'？如果你可以使用未簽名的版本，它應該可以工作。如果您必須使用帶符號的64位版本，則可以製作[文檔]中提供的示例變體（http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic使用atomicCAS進行任意原子訪問。如果您需要幫助，請作出相應迴應，我可以舉一個例子。 –

我的猜測是錯誤的編譯標誌。你正在尋找int以外的任何東西，你應該使用sm_12或更高。

如Robert Crovella所述，支持unsigned long long int變量，但long long int不支持。 Beginner CUDA - Simple var increment not working

#include <iostream> 

using namespace std; 

__global__ void inc(unsigned long long int *foo) { 
    atomicAdd(foo, 1); 
} 

int main() { 
    unsigned long long int count = 0, *cuda_count; 
    cudaMalloc((void**)&cuda_count, sizeof(unsigned long long int)); 
    cudaMemcpy(cuda_count, &count, sizeof(unsigned long long int), cudaMemcpyHostToDevice); 
    cout << "count: " << count << '\n'; 
    inc <<< 100, 25 >>> (cuda_count); 
    cudaMemcpy(&count, cuda_count, sizeof(unsigned long long int), cudaMemcpyDeviceToHost); 
    cudaFree(cuda_count); 
    cout << "count: " << count << '\n'; 
    return 0; 
}

在Linux編譯：

從使用的代碼nvcc -gencode arch=compute_12,code=sm_12 -o add add.cu

結果：

count: 0 
count: 2500

來源

2013-06-28 13:30:22

爲什麼'unsigned long long int'在文檔中顯示爲'atomicAdd（）'的重載，而不是'unsigned long int'？ – Adam27X

@ Adam27X我的猜測是，在NVidia架構中，long和int的大小相同，但long long int更大。 –

CUDA atomicAdd（）長長整型

回答

相關問題