std :: unordered_multiset的使用案例

我想知道爲什麼會使用std::unordered_multiset。我的猜測是，在插入/擦除之後，它與迭代器的無效或非無效有關，但也許它更深一層？非常類似的問題在這裏：Use cases of std::multimap，但它更多的是關於地圖的討論。std :: unordered_multiset的使用案例

來源

2015-10-18 Johann

至於你的問題，unordered_multiset容器的最重要的特點是：

它們是關聯容器，因此它們允許快速的數據查找和檢索相比（未分類）向量作爲。
它們通常比multisets更快，既可用於插入查找，也可用於查找，有時甚至可用於刪除（請參閱，例如this benchmark）。

因此，當你需要快速查找並不在意這些數據是無序的，一個典型的用例爲unordered_multiset是：

如果你沒有做任何數據查找在所有，一個向量可能是一個更好的解決方案;
如果您需要對數據進行排序，可能應該使用多重集。

請注意，還有其他情況下，無序的容器不能或不應該使用。

首先，如果哈希代價非常高昂，那麼無序的容器實際上可能比有序的容器慢。
其次，無序容器的最壞情況插入複雜度是線性的，而不是像有序容器的情況那樣是對數的。在實踐中，這意味着插入幾乎總是非常快速，除非容器的尺寸不時超過某個容量閾值，並且整個容器需要重新整理。因此，如果您對插入時間有嚴格的時間要求，或者如果重新刷新很慢，則可能無法使用無序容器。
第三，有些情況下，unordered_multiset的內存消耗高得無法接受（但有時可以通過微調容器參數來解決）。

關於訂購容器應優先選擇無序產品的用例，您可能需要閱讀this answer。有關容器選擇的一般準則，您可能需要閱讀How can I efficiently select a Standard Library container in C++11?。

編輯

考慮到無序多重集和矢量可以經常做非常類似的事情，是不是更好始終使用矢量？矢量不會自動優於無序多集？

轉載如下是一個非常簡單的基準的結果（全碼被設置在這篇文章的結尾）：

我們創建其可以是無序的多集，原始向量的容器中，或一個排序的向量;
我們交替插入一些隨機元素，然後計算一些隨機密鑰;
插入+計數操作重複100 000次;
測量並顯示這些100000插入+計數操作所需的總持續時間。

下面是整數的容器所獲得的結果：

 
|---------------------------------------------|----------------| 
| Environment    |  Windows 7 | CygWin 64 bits | 
| Compiler     | VS Express 2013 | gcc 4.9.3 | 
|---------------------------------------------|----------------| 
| unordered_multiset<int> | 0.75 s  |  0.8 s  | 
| vector<int>, unsorted | 7.9 s  | 11.0 s  | 
| vector<int>, sorted  | 1.0 s  |  0.6 s  | 
|---------------------------------------------|----------------|

在上面的例子中，無序多重集是Windows基準稍好，而排序矢量是用於CygWin的基準稍好。對於多目標開發，這裏沒有明顯的選擇。

下面是與串的容器類似的試驗的結果：

 
|-----------------------------------------------|----------------| 
| Environment    |  Windows 7 | CygWin 64 bits | 
| Compiler     | VS Express 2013 | gcc 4.9.3 | 
|-----------------------------------------------|----------------| 
| unordered_multiset<string> |  1 s  |  1 s  | 
| vector<string>, unsorted |  30 s  |  65 s  | 
| vector<string>, sorted  | 130 s  |  32 s  | 
|-----------------------------------------------|----------------|

在這個例子中，無序多集優於迄今爲止載體。

確切的數字在這裏並不重要，因爲它們特定於執行這些基準的特定條件（硬件，操作系統，編譯器，編譯器選項等）。重要的是矢量有時會超越無序的多層次，但有時它們不會。確定給定應用程序是否應使用無序多重集或矢量的唯一方法是儘可能真實地進行基準測試。

下面是包含整數容器基準的代碼。由於它是在飛行中開發的，歡迎所有更正和改進！

#include "stdafx.h" 
#include <iostream> 
#include <array> 
#include <unordered_set> 
#include <vector> 
#include <algorithm> 
#include <chrono> 
#include <cstdlib> 
#include <unordered_map> 
#include <string> 

using namespace std; 

const unsigned N = 100000;  // Number of test iterations (= insertions + lookups) 
typedef string Element;   // Type of data stored into the container to be tested 
array<Element, N> testData;  // Pseudo-random input sequence 
array<Element, N> lookupKeys; // Pseudo-random lookup keys 

// Test action for an unordered_multiset (insert some random data then count some random key) 
struct unordered_multiset_action 
{ 
    typedef unordered_multiset<Element> Container; 
    int operator()(Container& container, unsigned k) 
    { 
     container.insert(testData[k]); 
     return container.count(lookupKeys[k]); 
    } 
}; 

// Test action for an unsorted vector (insert some random data then count some random key) 
struct unsorted_vector_action 
{ 
    typedef vector<Element> Container; 
    int operator()(Container& container, unsigned k) 
    { 
     container.push_back(testData[k]);    
     return count(testData.cbegin(), testData.cend(), lookupKeys[k]); 
    } 
}; 

// Test action for a sorted vector (insert some random data then count some random key) 
struct sorted_vector_action 
{ 
    typedef vector<Element> Container; 
    int operator()(Container& container, unsigned k) 
    { 
     container.insert(upper_bound(container.begin(), container.end(), testData[k]), testData[k]); 
     auto range = equal_range(container.cbegin(), container.cend(), lookupKeys[k]); 
     return range.second - range.first; 
    } 
}; 

// Builds an empty container to be tested 
// Then invokes N times the test action (insert some random key then count some other random key) 
template<class Action> 
long long container_test(Action action) 
{ 
    using Container = typename Action::Container; 
    Container container; 
    long long keyCount = 0; 
    for (unsigned k = 0; k<N; ++k) 
     keyCount += action(container, k); 
    return keyCount; 
} 

int main(int nargs, char *args[]) 
{ 
    using namespace chrono; 

    // Parse user input to select which container should be tested 
    enum SelectedContainer { UNORDERED_MULTISET, UNSORTED_VECTOR, SORTED_VECTOR }; 
    unordered_map<string, SelectedContainer> decoder{ { "unordered_multiset", UNORDERED_MULTISET }, 
                 { "unsorted_vector", UNSORTED_VECTOR }, 
                 { "sorted_vector",  SORTED_VECTOR } }; 
    if (nargs < 2) 
    { 
     cerr << "Please provde an argument among those keywords: unordered_multiset, unsorted_vector, sorted_vector" << endl; 
     return (-1); 
    } 
    auto it = decoder.find(args[1]); 
    if (it == decoder.cend()) 
    { 
     cerr << "Please enter one of the following arguments: unordered_multiset, unsorted_vector, sorted_vector" << endl; 
     return (-1); 
    } 
    SelectedContainer selectedContainer = it->second; 

    // Generate pseudo-random input data and input keys (no seeding for reproducibility) 
    generate(testData.begin(), testData.end(), []() { return rand() % 256; }); 
    generate(lookupKeys.begin(), lookupKeys.end(), []() { return rand() % 256; }); 

    // Run test on container to be tested and measure elapsed time 
    auto startTime = high_resolution_clock::now(); 
    long long keyCount; 
    switch (selectedContainer) 
    { 
    case UNORDERED_MULTISET: 
     keyCount = container_test(unordered_multiset_action()); 
     break; 
    case UNSORTED_VECTOR: 
     keyCount = container_test(unsorted_vector_action()); 
     break; 
    case SORTED_VECTOR: 
     keyCount = container_test(sorted_vector_action()); 
     break; 
    }; 

    auto endTime = high_resolution_clock::now(); 

    // Summarize test results 
    duration<float> elaspedTime = endTime - startTime; 
    cout << "Performed " << N << " insertions interleaved with " << N << " data lookups" << endl; 
    cout << "Total key count = " << keyCount << endl; 
    cout << "Elapsed time: " << duration_cast<milliseconds>(elaspedTime).count() << " milliseconds" << endl; 
}

來源

2015-10-26 12:49:35

偉大的分析，但它似乎是一個向量將永遠比unordered_multiset更好的選擇。沒有？ – Johann

@Johann我編輯了這篇文章，爲可能的用例提供了一個基準。正如你所看到的，媒介在某些條件下表現優於無序多媒體，而在其他條件下則是相反的。哪一個更好，事先很難知道，所以基準測試總是最安全的方法。 –

std :: unordered_multiset的使用案例

回答

相關問題