使用CUDA獨立編譯導致推力錯誤

當我嘗試使用可重定位設備代碼啓用（-rdc = true）來編譯CUDA時，我遇到錯誤。我使用Visual Studio 2013作爲CUDA 7.5的編譯器。下面是一個顯示錯誤的小例子。爲了澄清，當-rdc = false時，下面的代碼運行良好，但是當設置爲true時，錯誤顯示出來。使用CUDA獨立編譯導致推力錯誤

錯誤簡單地說：CUDA錯誤11 [\ CUDA \詳細\幼獸\設備\調度/ device_radix_sort_dispatch.cuh，687]：無效參數

然後我發現this，它說：

When invoked with primitive data types, thrust::sort, thrust::sort_by_key,thrust::stable_sort, thrust::stable_sort_by_key may fail to link in some cases with nvcc -rdc=true.

是否有一些解決方法允許單獨編譯？

main.cpp中：

#include <stdio.h> 
#include <vector> 
#include "cuda_runtime.h" 
#include "RadixSort.h" 

typedef unsigned int uint; 
typedef unsigned __int64 uint64; 

int main() 
{ 
    RadixSort sorter; 

    uint n = 10; 
    std::vector<uint64> test(n); 
    for (uint i = 0; i < n; i++) 
     test[i] = i + 1; 

    uint64 * d_array; 
    uint64 size = n * sizeof(uint64); 

    cudaMalloc(&d_array, size); 
    cudaMemcpy(d_array, test.data(), size, cudaMemcpyHostToDevice); 

    try 
    { 
     sorter.Sort(d_array, n); 
    } 
    catch (const std::exception & ex) 
    { 
     printf("%s\n", ex.what()); 
    } 
}

RadixSort.h：

#pragma once 
typedef unsigned int uint; 
typedef unsigned __int64 uint64; 

class RadixSort 
{ 
public: 
    RadixSort() {} 
    ~RadixSort() {} 

    void Sort(uint64 * input, const uint n); 
};

RadixSort.cu：

#include "RadixSort.h" 

#include <thrust/device_vector.h> 
#include <thrust/device_ptr.h> 
#include <thrust/sort.h> 

void RadixSort::Sort(uint64 * input, const uint n) 
{ 
    thrust::device_ptr<uint64> d_input = thrust::device_pointer_cast(input); 
    thrust::stable_sort(d_input, d_input + n); 
    cudaDeviceSynchronize(); 
}

來源

2016-05-30 RobbinMarcus

關於此問題：「是否有一些解決方法可以單獨編譯？您正在運行哪個GPU？ –

目前的GTX 760. – RobbinMarcus

嘗試編譯架構設置爲匹配您的GTX 760，我相信應該是cc3.0。 –

正如羅伯特Crovella的評論提到：

將CUDA體系結構更改爲更高的價值將解決這個問題。在我的情況下，我將它改爲CUDA C++ - > Device - > Code Generation下的compute_30和sm_30。

編輯：

一般建議是爲您的特定GPU選擇最適合的層次結構。請參閱評論中的鏈接以獲取更多信息。

來源

2016-05-30 13:11:09 RobbinMarcus

使用CUDA獨立編譯導致推力錯誤

回答

相關問題