C++中的循環融合（如何幫助編譯器？）

我試圖理解C++編譯器在什麼情況下能夠執行循環融合，以及什麼時候不能。C++中的循環融合（如何幫助編譯器？）

下面的代碼測量兩種不同方式的性能，以計算矢量中所有值的平方加倍（f(x) = (2*x)^2）。

#include <chrono> 
#include <iostream> 
#include <numeric> 
#include <vector> 

constexpr int square(int x) 
{ 
    return x * x; 
} 

constexpr int times_two(int x) 
{ 
    return 2 * x; 
} 

// map ((^2) . (^2)) $ [1,2,3] 
int manual_fusion(const std::vector<int>& xs) 
{ 
    std::vector<int> zs; 
    zs.reserve(xs.size()); 
    for (int x : xs) 
    { 
     zs.push_back(square(times_two(x))); 
    } 
    return zs[0]; 
} 

// map (^2) . map (^2) $ [1,2,3] 
int two_loops(const std::vector<int>& xs) 
{ 
    std::vector<int> ys; 
    ys.reserve(xs.size()); 
    for (int x : xs) 
    { 
     ys.push_back(times_two(x)); 
    } 

    std::vector<int> zs; 
    zs.reserve(ys.size()); 
    for (int y : ys) 
    { 
     zs.push_back(square(y)); 
    } 
    return zs[0]; 
} 

template <typename F> 
void test(F f) 
{ 
    const std::vector<int> xs(100000000, 42); 

    const auto start_time = std::chrono::high_resolution_clock::now(); 
    const auto result = f(xs); 
    const auto end_time = std::chrono::high_resolution_clock::now(); 

    const auto elapsed = end_time - start_time; 
    const auto elapsed_us = std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count(); 
    std::cout << elapsed_us/1000 << " ms - " << result << std::endl; 
} 

int main() 
{ 
    test(manual_fusion); 
    test(two_loops); 
}

與兩個循環takes about twice as much time與一個循環的版本，甚至-O3 GCC和Clang的版本。

有沒有辦法允許編譯器優化two_loops到manual_fusion沒有在第二個循環就地操作嗎？我問的原因是我想要更快地向我的圖書館FunctionalPlus連鎖撥打電話fplus::enumerate(fplus::transform(f, xs));。

來源

2016-09-23 Tobias Hermann

你必須在第二個版本的兩個撥款。讓我們直接操作（修改）'ys' –

非常感謝，[it works]（http://ideone.com/I17kdT）！你認爲有可能讓編譯器優化分配嗎？ –

我不這麼認爲。會有太多的猜測這樣做（一個猜測已經禁止這種優化） –

您可以嘗試修改two_loops功能如下：

int two_loops(const std::vector<int>& xs) 
{ 
    std::vector<int> zs; 
    zs.reserve(xs.size()); 
    for (int x : xs) 
    { 
     zs.push_back(times_two(x)); 
    } 

    for (int i=0 : i<zs.size(); i++) 
    { 
     zs[i] = (square(zs[i])); 
    } 
    return zs[0]; 
}

的一點是避免兩次分配內存的push_back到另一個向量

來源

2016-09-23 11:38:12 Trifon

謝謝。這工作。但我希望可以不刪除分配。原因是，我想像'fplus :: enumerate（fplus :: transform（f，xs））;'更快地對我的庫[FunctionalPlus]（https://github.com/Dobiasd/FunctionalPlus）進行鏈式調用。我只是相應地提出了我的問題。 –

如果你想鏈接電話，那麼你將不得不支付更長的執行時間的價格。至少你可以避免分配和push_backs。第一次調用將創建向量，隨後的調用將改變向量內容。 – Trifon

不幸的是，改變內容並不適合我在FunctionalPlus中使用的方法。 –

C++中的循環融合（如何幫助編譯器？）

回答

相關問題