如何讓矢量元素獨特？（刪除不相鄰的重複項）

舉一個簡單的例子，考慮：

2 1 6 1 4 6 2 1 1

我試圖通過刪除不相鄰的重複和維護元素的順序，使這個vector獨特。

結果將是：

2 1 6 4

我嘗試的解決方案是：

將插入到一個std ::設置，但這種方法的問題是它會干擾元素的順序。
使用std :: sort和std :: unique的組合。但同樣的順序問題。

手動消除重複：

Define a temporary vector TempVector. 
    for (each element in a vector) 
    { 
     if (the element does not exists in TempVector) 
     { 
      add to TempVector; 
     } 
    } 
    swap orginial vector with TempVector.

我的問題是：

是否有任何STL算法可以從載體中刪除不相鄰的重複？它的複雜性是什麼？

來源

2009-09-21 aJ.

由於從元素構造一個集合需要花費O（n log n），所以不可能構造一個集合並且保持所看到的順序。所以，選擇一個O（n log n）然後去。 – 2012-06-19 11:35:02

我想你會做這樣的：

我會在矢量使用兩個迭代器：

第一個的讀取數據，並將其插入一個臨時組。

當讀取的數據不在集合中時，將其從第一個迭代器複製到第二個迭代器並將其增加。

最後你只保留數據直到第二個迭代器。

複雜度爲O（n .log（n）），因爲重複元素的查找使用集合而不是矢量。

#include <vector> 
#include <set> 
#include <iostream> 

int main(int argc, char* argv[]) 
{ 
    std::vector<int> k ; 

    k.push_back(2); 
    k.push_back(1); 
    k.push_back(6); 
    k.push_back(1); 
    k.push_back(4); 
    k.push_back(6); 
    k.push_back(2); 
    k.push_back(1); 
    k.push_back(1); 

{ 
    std::vector<int>::iterator r , w ; 

    std::set<int> tmpset ; 

    for(r = k.begin() , w = k.begin() ; r != k.end() ; ++r) 
    { 
     if(tmpset.insert(*r).second) 
     { 
      *w++ = *r ; 
     } 
    } 

    k.erase(w , k.end()); 
} 


    { 
     std::vector<int>::iterator r ; 

     for(r = k.begin() ; r != k.end() ; ++r) 
     { 
      std::cout << *r << std::endl ; 
     } 
    } 
}

來源

2009-09-21 08:14:38

使用'find'然後'insert'效率不高。如果值已插入，則'tmpset.insert（* r）.second'將爲true;如果該值已在集合中，則爲false。 – cjm 2009-09-21 08:47:04

我忘記了，我改正了 – 2009-09-21 08:48:55

這是關於時間，我們被允許寫'std :: vector k（{1,2,3}）;'...... – xtofl 2009-09-21 08:51:10

My question is:

Is there any STL algorithm which can remove the non-adjacent duplicates from the vector ? what is its complexity?

STL的選項是你提到的那些：把項目的std::set，或致電std::sort，std::unique並呼籲在容器上erase()。這些都不符合您的要求，即「刪除不相鄰的副本並維護元素的順序」。

那麼爲什麼STL不提供其他選擇？沒有標準的圖書館會爲每個用戶的需求提供一切。 STL的設計考慮因素包括「對幾乎所有用戶都足夠快」，「對幾乎所有用戶都有用」，「儘可能地提供例外安全性」（以及「對標準委員會來說足夠小」寫的很大，Stroustrup剔除了2/3的東西）。

我能想到的最簡單的辦法是這樣的：

// Note: an STL-like method would be templatized and use iterators instead of 
// hardcoding std::vector<int> 
std::vector<int> stable_unique(const std::vector<int>& input) 
{ 
    std::vector<int> result; 
    result.reserve(input.size()); 
    for (std::vector<int>::iterator itor = input.begin(); 
            itor != input.end(); 
            ++itor) 
     if (std::find(result.begin(), result.end(), *itor) == result.end()) 
      result.push_back(*itor); 
     return result; 
}

這個解決方案應該是O（M * N），其中M是唯一的元素數，N爲元素的總數。

來源

2009-09-21 08:42:34

砍伐是一項協同努力:) STL令人印象深刻，但它並不是一個簡單的複製和粘貼工作，以達到標準。只要看看獲得Boost的東西所需的努力，那個項目就是爲了讓東西進入標準而設計的。因此，「砍伐」只是寫出最重要的1/3的完整規格的副作用。 – MSalters 2009-09-21 09:51:47

沒有STL算法做你想要保留序列的原始順序。

您可以在向量中創建一個迭代器或索引的std::set，比較謂詞使用引用的數據而不是迭代器/索引來排序內容。然後，您從集合中未引用的矢量中刪除所有內容。（當然，你也可以同樣使用其他std::vector迭代器/索引，std::sort和std::unique的是，並以此作爲參考，以什麼來保持。）

來源

2009-09-21 08:44:55 sbi

據我知道有沒有在STL。看看reference。

來源

2009-09-21 08:47:12 Yelonek

您可以刪除一些循環使用remove_copy_iffa's答案：

class NotSeen : public std::unary_function <int, bool> 
{ 
public: 
    NotSeen (std::set<int> & seen) : m_seen (seen) { } 

    bool operator()(int i) const { 
    return (m_seen.insert (i).second); 
    } 

private: 
    std::set<int> & m_seen; 
}; 

void removeDups (std::vector<int> const & iv, std::vector<int> & ov) { 
    std::set<int> seen; 
    std::remove_copy_if (iv.begin() 
     , iv.end() 
     , std::back_inserter (ov) 
     , NotSeen (seen)); 
}

這沒有對算法的複雜性影響（即書面它也是爲O（n log n）的）。你可以使用unordered_set來改進，或者如果你的值的範圍足夠小，你可以簡單地使用一個數組或一個bitarray。

來源

2009-09-21 10:34:46

U_STD是從確定知道std :: namespace將被調用之前的宏嗎？ – 2009-09-21 13:18:25

@onebyone：差不多！這是一個宏，我們真的不需要再使用它並返回到使用舊的g ++編譯器（<= 2.95.3）的時候。習慣的力量讓我無處不在！ – 2009-09-23 07:55:51

正確使用''。然而（這同樣適用於@RichardCorden和@fa。的答案：除非OP有*非常大的數據集有很多獨特的條目，'set'可能比使用輸出'vector'慢由於更好的緩存行爲（對於像[btree_set]（https://code.google.com/p/cpp-btree/）這樣的大型數據集）對於小數據集可能更好，）。 – Joe 2013-06-21 21:06:50

，不使用臨時set這是可能的（可能）要做到這一點的表現有些茫然：

template<class Iterator> 
Iterator Unique(Iterator first, Iterator last) 
{ 
    while (first != last) 
    { 
     Iterator next(first); 
     last = std::remove(++next, last, *first); 
     first = next; 
    } 

    return last; 
}

用作：

vec.erase(Unique(vec.begin(), vec.end()), vec.end());

對於較小的數據集，實施簡單和缺少額外的分配可能會抵消使用額外的set的理論上更高的複雜性。儘管如此，使用具有代表性的輸入進行測量是確保唯一的方法。

來源

2009-09-23 08:23:18

優秀的解決方案。我建議名稱Unique是不是一個顯示，因爲我想。怎麼樣RemoveNonUnique？ – 2009-09-23 08:50:46

John Torjo有一篇很好的文章，它以系統的方式處理這個問題。他出現的結果似乎更通用，更高效比任何解決方案，至今這裏建議：

http://www.builderau.com.au/program/java/soa/C-Removing-duplicates-from-a-range/0,339024620,320271583,00.htm

https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1052159.html

不幸的是，約翰的解決方案的完整的代碼似乎不再可用，約翰沒有迴應可能發送電子郵件。因此，我寫了我自己的代碼，基於類似他的理由，但在一些細節上故意有所不同。如果您願意，請隨時與我聯繫（vschoech think-cell com）並討論細節。

爲了讓代碼爲你編譯，我添加了一些我自己定期使用的庫東西。另外，我不用簡單的stl，而是使用boost來創建更通用，更高效，更易讀的代碼。

玩得開心！

#include <vector> 
#include <functional> 

#include <boost/bind.hpp> 
#include <boost/range.hpp> 
#include <boost/iterator/counting_iterator.hpp> 

///////////////////////////////////////////////////////////////////////////////////////////// 
// library stuff 

template< class Rng, class Func > 
Func for_each(Rng& rng, Func f) { 
    return std::for_each(boost::begin(rng), boost::end(rng), f); 
}; 

template< class Rng, class Pred > 
Rng& sort(Rng& rng, Pred pred) { 
    std::sort(boost::begin(rng), boost::end(rng), pred); 
    return rng; // to allow function chaining, similar to operator+= et al. 
} 

template< class T > 
boost::iterator_range< boost::counting_iterator<T> > make_counting_range(T const& tBegin, T const& tEnd) { 
    return boost::iterator_range< boost::counting_iterator<T> >(tBegin, tEnd); 
} 

template< class Func > 
class compare_less_impl { 
private: 
    Func m_func; 
public: 
    typedef bool result_type; 
    compare_less_impl(Func func) 
    : m_func(func) 
    {} 
    template< class T1, class T2 > bool operator()(T1 const& tLeft, T2 const& tRight) const { 
     return m_func(tLeft) < m_func(tRight); 
    } 
}; 

template< class Func > 
compare_less_impl<Func> compare_less(Func func) { 
    return compare_less_impl<Func>(func); 
} 


///////////////////////////////////////////////////////////////////////////////////////////// 
// stable_unique 

template<class forward_iterator, class predicate_type> 
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd, predicate_type predLess) { 
    typedef std::iterator_traits<forward_iterator>::difference_type index_type; 
    struct SIteratorIndex { 
     SIteratorIndex(forward_iterator itValue, index_type idx) : m_itValue(itValue), m_idx(idx) {} 
     std::iterator_traits<forward_iterator>::reference Value() const {return *m_itValue;} 
     index_type m_idx; 
    private: 
     forward_iterator m_itValue; 
    }; 

    // {1} create array of values (represented by iterators) and indices 
    std::vector<SIteratorIndex> vecitidx; 
    vecitidx.reserve(std::distance(itBegin, itEnd)); 
    struct FPushBackIteratorIndex { 
     FPushBackIteratorIndex(std::vector<SIteratorIndex>& vecitidx) : m_vecitidx(vecitidx) {} 
     void operator()(forward_iterator itValue) const { 
      m_vecitidx.push_back(SIteratorIndex(itValue, m_vecitidx.size())); 
     } 
    private: 
     std::vector<SIteratorIndex>& m_vecitidx; 
    }; 
    for_each(make_counting_range(itBegin, itEnd), FPushBackIteratorIndex(vecitidx)); 

    // {2} sort by underlying value 
    struct FStableCompareByValue { 
     FStableCompareByValue(predicate_type predLess) : m_predLess(predLess) {} 
     bool operator()(SIteratorIndex const& itidxA, SIteratorIndex const& itidxB) { 
      return m_predLess(itidxA.Value(), itidxB.Value()) 
       // stable sort order, index is secondary criterion 
       || !m_predLess(itidxB.Value(), itidxA.Value()) && itidxA.m_idx < itidxB.m_idx; 
     } 
    private: 
     predicate_type m_predLess; 
    }; 
    sort(vecitidx, FStableCompareByValue(predLess)); 

    // {3} apply std::unique to the sorted vector, removing duplicate values 
    vecitidx.erase(
     std::unique(vecitidx.begin(), vecitidx.end(), 
      !boost::bind(predLess, 
       // redundand boost::mem_fn required to compile 
       boost::bind(boost::mem_fn(&SIteratorIndex::Value), _1), 
       boost::bind(boost::mem_fn(&SIteratorIndex::Value), _2) 
      ) 
     ), 
     vecitidx.end() 
    ); 

    // {4} re-sort by index to match original order 
    sort(vecitidx, compare_less(boost::mem_fn(&SIteratorIndex::m_idx))); 

    // {5} keep only those values in the original range that were not removed by std::unique 
    std::vector<SIteratorIndex>::iterator ititidx = vecitidx.begin(); 
    forward_iterator itSrc = itBegin; 
    index_type idx = 0; 
    for(;;) { 
     if(ititidx==vecitidx.end()) { 
      // {6} return end of unique range 
      return itSrc; 
     } 
     if(idx!=ititidx->m_idx) { 
      // original range must be modified 
      break; 
     } 
     ++ititidx; 
     ++idx; 
     ++itSrc; 
    } 
    forward_iterator itDst = itSrc; 
    do { 
     ++idx; 
     ++itSrc; 
     // while there are still items in vecitidx, there must also be corresponding items in the original range 
     if(idx==ititidx->m_idx) { 
      std::swap(*itDst, *itSrc); // C++0x move 
      ++ititidx; 
      ++itDst; 
     } 
    } while(ititidx!=vecitidx.end()); 

    // {6} return end of unique range 
    return itDst; 
} 

template<class forward_iterator> 
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd) { 
    return stable_unique(itBegin, itEnd, std::less< std::iterator_traits<forward_iterator>::value_type >()); 
} 

void stable_unique_test() { 
    std::vector<int> vecn; 
    vecn.push_back(1); 
    vecn.push_back(17); 
    vecn.push_back(-100); 
    vecn.push_back(17); 
    vecn.push_back(1); 
    vecn.push_back(17); 
    vecn.push_back(53); 
    vecn.erase(stable_unique(vecn.begin(), vecn.end()), vecn.end()); 
    // result: 1, 17, -100, 53 
}

來源

2010-02-12 09:34:05 vschoech

有趣的但...排序意味着O（N *日誌N），而O（N）解決方案基於哈希Set/Bloom Filters exists。 – 2010-08-30 14:06:59

由於問題是「是否有任何STL算法...？它的複雜性是什麼？「這是有道理的實現像std::unique功能：

template <class FwdIterator> 
inline FwdIterator stable_unique(FwdIterator first, FwdIterator last) 
{ 
    FwdIterator result = first; 
    std::unordered_set<typename FwdIterator::value_type> seen; 

    for (; first != last; ++first) 
     if (seen.insert(*first).second) 
      *result++ = *first; 
    return result; 
}

因此，這是怎麼std::unique實施加上一組額外的unordered_set應比普通set更快的所有元素都刪除了比較相等的元素。（第一個元素保留，因爲我們不能統一到無）迭代器返回指向[first,last)範圍內的新結點

編輯：最後一句意思是容器本身不被unique修改。這可能令人困惑。下面的例子是實際的將容器減少到統一集合。

1: std::vector<int> v(3, 5); 
2: v.resize(std::distance(v.begin(), unique(v.begin(), v.end()))); 
3: assert(v.size() == 1);

第1行創建矢量{ 5, 5, 5 }。在第二行中調用unique將第二個元素的迭代器返回，這是第一個不唯一的元素。因此distance返回1並且resize修剪向量。

來源

2012-05-09 16:29:18

好的解決方案使用C++ 11，並在第5行添加typename關鍵字。 – 2015-03-24 02:12:37

是的，應該提到'std :: unordered_set'是C++ 11。如果不可用（即'__cplusplus <201103L'）只需使用'std :: set'。 – 2015-07-18 07:19:59

基礎上@戈登的回答，但使用lambda表達式，而是將刪除基於@fa的答案輸出向量

set<int> s; 
    vector<int> nodups; 
    remove_copy_if(v.begin(), v.end(), back_inserter(nodups), 
     [&s](int x){ 
      return !s.insert(x).second; //-> .second false means already exists 
     });

來源

2013-06-21 20:53:11 winterlight

寫他們的副本。它也可以開始使用STL算法std::stable_partition改寫：

struct dupChecker_ { 
    inline dupChecker_() : tmpSet() {} 
    inline bool operator()(int i) { 
     return tmpSet.insert(i).second; 
    } 
private: 
    std::set<int> tmpSet; 
}; 

k.erase(std::stable_partition(k.begin(), k.end(), dupChecker_()), k.end());

這樣，它是更緊湊，我們並不需要關心的迭代器。

它似乎甚至沒有引入太多的性能損失。我在我的項目中使用它，需要經常處理非常大的載體，它並沒有真正的區別。

另一個不錯的特性是，可以通過使用std::set<int, myCmp_> tmpSet;來調整唯一性。例如，在我的項目中忽略某些舍入錯誤。

來源

2013-09-24 14:41:45

鑑於你的輸入是vector<int> foo可以使用remove在這裏做腿部的工作適合你，那麼如果你要收縮的載體只是用erase否則只使用last作爲一個過去的最末端迭代器時你想刪除重複，但爲了矢量保留：

auto last = end(foo); 

for(auto first = begin(foo); first < last; ++first) last = remove(next(first), last, *first); 

foo.erase(last, end(foo));

Live Example

至於時間複雜度，這將是O（nm）的。其中n是元素的個數和m是唯一元素的個數。就空間的複雜性而言，這將使用O（n），因爲它會在適當的位置進行移除。

來源

2017-08-11 16:55:44

如何讓矢量元素獨特？ （刪除不相鄰的重複項）

回答

相關問題

如何讓矢量元素獨特？（刪除不相鄰的重複項）