如何將元素（cv :: Point）添加到共享數組中 - CUDA

我是Cuda技術的新成員。我需要幫助CUDA查找二進制（單色）圖像只有像素，其值爲白色（255）。然後需要像素對輸出數組進行排序。我的解決方案基於關鍵部分。但是，它會給出不正確的結果。如何將元素（cv :: Point）添加到共享數組中 - CUDA

//----- call kernel: ----- 
{ 
    const dim3 block(16,16); 
    const dim3 grid(divUp(_binImg.cols, block.x), divUp(_binImg.rows, block.y)); 
    // others allocations, declarations ... 
    cudaCalcWhitePixels<<<grid, block>>>(_binImg, _index, _pointsX, _pointsY); 
} 

__device__ int lock = 0; 
__global__ void cudaCalcWhitePixels(cv::gpu::PtrStepSzb _binImg, int *_index, int *_pointsX, int *_pointsY) 
{ 
    extern int lock; 
    const int x = blockIdx.x * blockDim.x + threadIdx.x; 
    const int y = blockIdx.y * blockDim.y + threadIdx.y; 

    __syncthreads(); 

    if(x < _binImg.cols && y < _binImg.rows) 
    { 
     if(_binImg.ptr(y)[x] == 255) 
     { 
      do{} while(atomicCAS(&lock, 0, 1) != 0) 

      //----- critical section ------ 

      _pointsX[*_index] = x; 
      _pointsY[*_index] = y; 
      (*_index)++; 
      lock = 0; 

      //----- end CS ------ 
     } 
    } 
}

在我看來，關鍵部分工作不正常。圖像中的白色像素約佔1％。

您能否給我建議？謝謝你，有一個愉快的一天:)

編輯： 解決方案：

__global__ void cudaCalcWhitePixels(cv::gpu::PtrStepSzb _binImg, int *_index, int *_pointsX, int *_pointsY) 
{ 
    int myIndex = 0; 
    const int x = blockIdx.x * blockDim.x + threadIdx.x; 
    const int y = blockIdx.y * blockDim.y + threadIdx.y; 

    __syncthreads(); 

    if(x < _binImg.cols && y < _binImg.rows) 
    { 
     if(_binImg.ptr(y)[x] == 255) 
     { 
      //----- critical section ------ 

      myIndex = atomicAdd(_index, 1); 
      _pointsX[myIndex] = x; 
      _pointsY[myIndex] = y; 

      //----- end CS ------ 
     } 
    } 
}

來源

2013-09-30 Karel Kříž

爲什麼你甚至需要在這個內核中的關鍵部分？難道你不是隻是自動增加'_index'而不是？ – talonmies

你是對的。我沒有意識到，'atomicAdd（）'返回舊值。 Thx –

此代碼從以下網址可以幫助您瞭解如何使用atomicCAS()創建一個關鍵部分。

https://github.com/ArchaeaSoftware/cudahandbook/blob/master/memory/spinlockReduction.cu

class cudaSpinlock { 
public: 
    cudaSpinlock(int *p); 
    void acquire(); 
    void release(); 
private: 
    int *m_p; 
}; 

inline __device__ 
cudaSpinlock::cudaSpinlock(int *p) 
{ 
    m_p = p; 
} 

inline __device__ void 
cudaSpinlock::acquire() 
{ 
    while (atomicCAS(m_p, 0, 1)); 
} 

inline __device__ void 
cudaSpinlock::release() 
{ 
    atomicExch(m_p, 0); 
}

由於(*_index)++;是你在CS做的唯一的原子操作，你可以考慮使用atomicAdd()來代替。

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd

在另一方面，你可以嘗試使用thrust::copy_if()簡化編碼。

來源

2013-10-01 06:10:57 kangshiyin

謝謝你的幫助！我沒有意識到，'atomicAdd（）'返回'舊值'。它優雅地解決了我的問題。 –

如何將元素（cv :: Point）添加到共享數組中 - CUDA

回答

相關問題