C++ 11矢量<bool>性能問題（使用代碼示例）

我注意到在運行以下代碼時，矢量比bool數組慢得多。C++ 11矢量<bool>性能問題（使用代碼示例）

int main() 
{ 
    int count = 0; 
    int n = 1500000; 
    // slower with c++ vector<bool> 
    /*vector<bool> isPrime; 
    isPrime.reserve(n); 
    isPrime.assign(n, true); 
    */ 
    // faster with bool array 
    bool* isPrime = new bool[n]; 

    for (int i = 0; i < n; ++i) 
     isPrime[i] = true; 


    for (int i = 2; i< n; ++i) { 
     if (isPrime[i]) 
      count++; 
     for (int j =2; i*j < n; ++j) 
      isPrime[i*j] = false; 
    } 

    cout << count << endl; 
    return 0; 
}

有什麼方法可以讓vector<bool>更快？順便說一句，std::vector::push_back和std::vector::emplace_back都比std::vector::assign慢。

來源

2016-04-29 guoqing

你正在訪問'isPrime'超出它的末尾，它應該是'new bool [n]' –

如果你對性能超級關心，不要使用'vector '。該標準要求非常節省空間，並且具有性能成本。 –

你在談論減速有多大？您可能想要添加一些時間示例以使此問題更具吸引力。 – anderas

vector<bool>可能有一個模板專門化，並可能使用位數組來實現以節省空間。提取並保存一點並將其從/轉換爲bool可能會導致您觀察到的性能下降。如果你使用std::vector::push_back，你正在調整矢量，這將導致更糟的性能。下一個性能殺手可能是assign（最差的複雜度：第一個參數的線性），改爲使用operator []（複雜度：常量）。另一方面，bool []保證爲bool的數組。

而且您應該調整爲n而不是n-1以避免未定義的行爲。

來源

2016-04-29 07:54:06

它不僅「可能有」這樣的專業化，這實際上是在標準！ – anderas

一種解決方法是使用'deque '。或'矢量'。 :) –

@Mohit Jain：模板專業化應該在編譯時發生。它應該影響運行時間性能，不是嗎？ – guoqing

std::vector<bool>可能有各種性能問題（例如看看https://isocpp.org/blog/2012/11/on-vectorbool）。

一般來說，你可以：

使用std::vector<std::uint8_t>代替std::vector<bool>（試用一下std::valarray<bool>也）。

這需要更多的內存，並且緩存友好性較低，但是沒有開銷（以位操作的形式）訪問單個值，所以在某些情況下它可以更好地工作（畢竟它就像您的bool但沒有內存管理的滋擾）
使用std::bitset數組，如果你在編譯時知道你的布爾數組多大將是（或者，如果你至少可以建立一個合理的上限）
如果增強是一個選項嘗試boost::dynamic_bitset（大小可以在運行時指定）

但你必須測試速度優化...

有了您的具體的例子，我只能證實的性能差異時優化關閉（當然這不是要走的路）。

使用g ++ v4.8.3和鐺++ v3.4.5一個Intel Xeon處理器的系統上（-O3優化電平），得到不同的畫面一些測試：

    time (ms) 
       G++  CLANG++ 
array of bool 3103  3010 
vector<bool>  2835  2420 // not bad! 
vector<char>  3136  3031 // same as array of bool 
bitset   2742  2388 // marginally better

（時間在答案爲經過的代碼100個運行）

std::vector<bool>看起來不那麼糟糕（源代碼here）。

來源

2016-04-29 08:29:28 manlio

「Xeon」可以是P4到Skylake之間的任何東西。說*哪個* Xeon（例如Haswell Xeon或Exxxx v3）會更具信息性。這些都是適用於現代硬件的非常舊的編譯器版本（如果不是自動矢量化或使用「-march = native」，那麼這些版本不會太大）。 –

@PeterCordes你是對的。這是至強e3-1230v3（使用'-march = native'開關）。在我的辯護中，我會補充說，它不是一個完整的考試，但我會盡快添加添加測試代碼和更多結果。 – manlio

您不必爲此做出重大決定。任何時候你發佈任何perf數字，特定的微架構和編譯器版本+選項都是必不可少的。例如鏗鏘聲++ 3.4.5在Haswell Xeon上，'-O3 -march = native'會覆蓋它。順便說一句，對於AVX2，clang 3.7.1（當前穩定）自動矢量化的性能明顯優於3.4，就asm的質量而言。 IDK多少次會導致perf差異，因爲內存瓶頸常常隱藏CPU瓶頸。 http://llvm.org/apt/擁有基於debian的Linux發行版的當前版本。 –

vector<bool>可以是高性能，但不是必需的。對於vector<bool>要高效，它需要一次在許多布爾操作上（例如，isPrime.assign(n, true)），和，實現者不得不十分小心。對vector<bool>中的單個布爾進行索引很慢。

這裏是一個最好的取景器，我寫了一段時間後使用vector<bool>和鐺+的libC++（libc中++部分是很重要的）：

#include <algorithm> 
#include <chrono> 
#include <iostream> 
#include <vector> 

std::vector<bool> 
init_primes() 
{ 
    std::vector<bool> primes(0x80000000, true); 
    primes[0] = false; 
    primes[1] = false; 
    const auto pb = primes.begin(); 
    const auto pe = primes.end(); 
    const auto sz = primes.size(); 
    size_t i = 2; 
    while (true) 
    { 
     size_t j = i*i; 
     if (j >= sz) 
      break; 
     do 
     { 
      primes[j] = false; 
      j += i; 
     } while (j < sz); 
     i = std::find(pb + (i+1), pe, true) - pb; 
    } 
    return primes; 
} 

int 
main() 
{ 
    using namespace std::chrono; 
    using dsec = duration<double>; 
    auto t0 = steady_clock::now(); 
    auto p = init_primes(); 
    auto t1 = steady_clock::now(); 
    std::cout << dsec(t1-t0).count() << "\n"; 
}

這在約28秒（-O3）執行我。當我改變它來代替vector<char>時，執行時間上升到大約44s。

如果你運行這個使用其他一些std :: lib，你可能不會看到這種趨勢。在libC++算法（如std::find）上進行了優化，可以一次搜索一個字的位，而不是一次一個位。

請參閱http://howardhinnant.github.io/onvectorbool.html瞭解有關可以通過供應商優化哪些std算法的更多詳細信息。

來源

2016-04-29 18:11:42

C++ 11矢量<bool>性能問題（使用代碼示例）

回答

相關問題