CUDA推力remove_if與operlapping模板序列

我試圖根據第一個向量的值從兩個thrust::device_vector<int>中移除元素。憑直覺我創建了以下剪斷：CUDA推力remove_if與operlapping模板序列

thrust::device_vector<float> idxToValue(COUNT_MAX); 
thrust::device_vector<int> idxSorted(COUNT_MAX); 
thrust::device_vector<int> groupIdxSorted(COUNT_MAX); 
int count = COUNT_MAX; 
float const minThreshold = MIN_THRESHOLD; 

auto idxToValueSortedIter = thrust::make_permutation_iterator(
    idxToValue.begin() 
    , idxSorted.begin() 
    ); 

auto new_end = thrust::remove_if(
    thrust::make_zip_iterator(thrust::make_tuple(idxSorted.begin(), groupIdxSorted.begin())) 
    , thrust::make_zip_iterator(thrust::make_tuple(idxSorted.begin() + count, groupIdxSorted.begin() + count)) 
    , idxToValueSortedIter 
    , thrust::placeholders::_1 >= minThreshold 
    ); 

count = thrust::get<0>(new_end.get_iterator_tuple()) - idxSorted.begin();

推力單證不幸說

範圍[模板，模板+（最後一個 - 第一））應當互不重疊的範圍[結果，結果+（最後 - 第一））

所以在我的情況下idxToValueSortedIter，其被用作模版序列，取決於idxSorted實際上是在重疊的結果（相同的載體）。

有沒有辦法解決這個問題，而不需要將數據複製到臨時向量？

來源

2014-09-18 wondering

我認爲你可以通過使用非模板版本remove_if（沒有模具，它沒有對模板與輸出序列重疊的限制）和通過你的模板（即你的排列迭代器）作爲您的zip_iterator至remove_if的第三個成員加上適當的選擇函子。下面是一個樣例：

$ cat t572.cu 
#include <iostream> 
#include <thrust/device_vector.h> 
#include <thrust/remove.h> 
#include <thrust/iterator/zip_iterator.h> 
#include <thrust/iterator/permutation_iterator.h> 
#include <thrust/copy.h> 

#define COUNT_MAX 10 
#define MIN_THRESHOLD 4.5f 

struct my_functor 
{ 
    float thresh; 
    my_functor(float _thresh): thresh(_thresh) {} 

    template <typename T> 
    __host__ __device__ 
    bool operator()(T &mytuple) const { 
    return thrust::get<2>(mytuple) > thresh; 
    } 
}; 

int main(){ 

    float h_idxToValue[COUNT_MAX] = {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; 
    int h_idxSorted[COUNT_MAX] = {9, 8, 7, 6, 5, 4, 3, 2, 1, 0}; 
    int h_groupIdxSorted[COUNT_MAX] = {20, 21, 22, 23, 24, 25, 26, 27, 28, 29}; 

    thrust::device_vector<float> idxToValue(h_idxToValue, h_idxToValue + COUNT_MAX); 
    thrust::device_vector<int> idxSorted(h_idxSorted, h_idxSorted + COUNT_MAX); 
    thrust::device_vector<int> groupIdxSorted(h_groupIdxSorted, h_groupIdxSorted + COUNT_MAX); 
    int count = COUNT_MAX; 
    float const minThreshold = MIN_THRESHOLD; 

    auto new_end = thrust::remove_if(
    thrust::make_zip_iterator(thrust::make_tuple(idxSorted.begin(), groupIdxSorted.begin(), thrust::make_permutation_iterator(idxToValue.begin(), idxSorted.begin()))) 
    , thrust::make_zip_iterator(thrust::make_tuple(idxSorted.begin() + count, groupIdxSorted.begin() + count, thrust::make_permutation_iterator(idxToValue.begin(), idxSorted.begin() + count))) 
    , my_functor(minThreshold) 
    ); 

    count = thrust::get<0>(new_end.get_iterator_tuple()) - idxSorted.begin(); 

    std::cout << "count = " << count << std::endl; 
    thrust::copy_n(groupIdxSorted.begin(), count, std::ostream_iterator<int>(std::cout, ",")); 
    std::cout << std::endl; 
    return 0; 
} 

$ nvcc -arch=sm_20 -std=c++11 -o t572 t572.cu 
$ ./t572 
count = 5 
25,26,27,28,29, 
$

我們通常會想到remove_if功能與所提供的仿函數刪除其idxToValue值大於閾值（4.5）更大的條目。但是，由於idxSorted中的排列迭代器和反向排序順序，我們看到高於閾值的值是保留而其他排除。上面的例子是CUDA 6.5和Fedora 20利用了實驗性的C++ 11支持。

來源

2014-09-20 03:15:17

我實際上在尋找一種避免自定義函子的方法（即通過使用佔位符），但這是我所要求的，它的工作原理！謝謝！ – wondering 2014-09-21 18:19:04

CUDA推力remove_if與operlapping模板序列

回答

相關問題