在CUDA設備代碼中使用std :: vector

問題是：有沒有辦法在Cuda內核中使用類「vector」？當我嘗試我得到以下錯誤：在CUDA設備代碼中使用std :: vector

error : calling a host function("std::vector<int, std::allocator<int> > ::push_back") from a __device__/__global__ function not allowed

所以有辦法在全球部分使用一個向量？我最近嘗試以下操作：

創建一個新的Cuda項目
進入項目的性質
開放CUDA C/C++
進入設備
變化「代碼值代「設置爲這個值： compute_20，sm_20

........之後，我能夠使用我Cuda內核中的printf標準庫函數。

有沒有辦法在內核代碼中支持printf的方式使用標準庫類vector？這是在內核代碼用printf的例子：

// this code only to count the 3s in an array using Cuda 
//private_count is an array to hold every thread's result separately 

__global__ void countKernel(int *a, int length, int* private_count) 
{ 
    printf("%d\n",threadIdx.x); //it's print the thread id and it's working 

    // vector<int> y; 
    //y.push_back(0); is there a possibility to do this? 

    unsigned int offset = threadIdx.x * length; 
    int i = offset; 
    for(; i < offset + length; i++) 
    { 
     if(a[i] == 3) 
     { 
      private_count[threadIdx.x]++; 
      printf("%d ",a[i]); 
     } 
    } 
}

來源

2012-04-29 Yasser

+1完全合法的問題（不知道爲什麼它被否決。很不幸，答案是目前還沒有。 – harrism

您不能使用STL的CUDA，但是你可以使用Thrust library做你想做什麼。否則，只需將矢量的內容複製到設備並正常操作即可。

來源

2012-04-29 20:47:34

我不明白這是怎麼應該幫助，因爲'推力:: device_vector'不能在內核中使用，無論是。 – thatWiseGuy

你不能在設備代碼中使用std::vector，你應該使用array來代替。

來源

2012-04-30 12:58:53 yyfn

在CUDA庫推力，可以使用thrust::device_vector<classT>上定義裝置的載體，以及宿主STL矢量和設備矢量之間的數據傳輸是很簡單的。你可以參考這個有用的鏈接：http://docs.nvidia.com/cuda/thrust/index.html找到一些有用的例子。

來源

2013-05-03 16:02:31

-1

我想你可以自己實現一個設備向量，因爲CUDA支持設備代碼中的動態內存分配。新的操作符/刪除也支持。這是CUDA中一個非常簡單的設備矢量原型，但它確實有效。它沒有被充分測試。

template<typename T> 
class LocalVector 
{ 
private: 
    T* m_begin; 
    T* m_end; 

    size_t capacity; 
    size_t length; 
    __device__ void expand() { 
     capacity *= 2; 
     size_t tempLength = (m_end - m_begin); 
     T* tempBegin = new T[capacity]; 

     memcpy(tempBegin, m_begin, tempLength * sizeof(T)); 
     delete[] m_begin; 
     m_begin = tempBegin; 
     m_end = m_begin + tempLength; 
     length = static_cast<size_t>(m_end - m_begin); 
    } 
public: 
    __device__ explicit LocalVector() : length(0), capacity(16) { 
     m_begin = new T[capacity]; 
     m_end = m_begin; 
    } 
    __device__ T& operator[] (unsigned int index) { 
     return *(m_begin + index);//*(begin+index) 
    } 
    __device__ T* begin() { 
     return m_begin; 
    } 
    __device__ T* end() { 
     return m_end; 
    } 
    __device__ ~LocalVector() 
    { 
     delete[] m_begin; 
     m_begin = nullptr; 
    } 

    __device__ void add(T t) { 

     if ((m_end - m_begin) >= capacity) { 
      expand(); 
     } 

     new (m_end) T(t); 
     m_end++; 
     length++; 
    } 
    __device__ T pop() { 
     T endElement = (*m_end); 
     delete m_end; 
     m_end--; 
     return endElement; 
    } 

    __device__ size_t getSize() { 
     return length; 
    } 
};

來源

2017-08-14 09:23:34

在CUDA設備代碼中使用std :: vector

回答

相關問題