在編譯時生成函數

我有一個圖像。每個像素都包含有關RGB強度的信息。現在我想總結這些頻道的信息，但我也想選擇哪個頻道的總和。 Straightforwad實施，這將是這樣的：在編譯時生成函數

int intensity(const unsiged char* pixel, bool red, bool green, bool blue){ 
    return 0 + (red ? pixel[0] : 0) + (green ? pixel[1] : 0) + (blue ? pixel[2] : 0); 
}

因爲我會調用這個函數，針對圖像的每個像素我要放棄所有條件。如果我可以。所以我想我一定要對每一個案件的函數：

std::function<int(const unsigned char* pixel)> generateIntensityAccumulator(
    const bool& accumulateRChannel, 
    const bool& accumulateGChannel, 
    const bool& accumulateBChannel) 
    { 
    if (accumulateRChannel && accumulateGChannel && accumulateBChannel){ 
      return [](const unsigned char* pixel){ 
       return static_cast<int>(pixel[0]) + static_cast<int>(pixel[1]) + static_cast<int>(pixel[2]); 
      }; 
     } 

     if (!accumulateRChannel && accumulateGChannel && accumulateBChannel){ 
      return [](const unsigned char* pixel){ 
       return static_cast<int>(pixel[1]) + static_cast<int>(pixel[2]); 
      }; 
     } 

     if (!accumulateRChannel && !accumulateGChannel && accumulateBChannel){ 
      return [](const unsigned char* pixel){ 
       return static_cast<int>(pixel[2]); 
      }; 
     } 

     if (!accumulateRChannel && !accumulateGChannel && !accumulateBChannel){ 
      return [](const unsigned char* pixel){ 
       return 0; 
      }; 
     } 

     if (accumulateRChannel && !accumulateGChannel && !accumulateBChannel){ 
      return [](const unsigned char* pixel){ 
       return static_cast<int>(pixel[0]); 
      }; 
     } 

     if (!accumulateRChannel && accumulateGChannel && !accumulateBChannel){ 
      return [](const unsigned char* pixel){ 
       return static_cast<int>(pixel[1]); 
      }; 
     } 

     if (accumulateRChannel && !accumulateGChannel && accumulateBChannel){ 
      return [](const unsigned char* pixel){ 
       return static_cast<int>(pixel[0]) + static_cast<int>(pixel[2]); 
      }; 
     } 

     if (accumulateRChannel && accumulateGChannel && !accumulateBChannel){ 
      return [](const unsigned char* pixel){ 
       return static_cast<int>(pixel[0]) + static_cast<int>(pixel[1]); 
      }; 
     } 
    }

現在我可以用這個發電機進入圖像循環使用前和使用功能，不附加任何條件：

... 

auto accumulator = generateIntensityAccumulator(true, false, true); 

for(auto pixel : pixels){ 
auto intensity = accumulator(pixel); 
} 

...

但它是一個很大的寫這樣一個簡單的任務，我有一種感覺，有一個更好的方法來完成這個任務：例如讓編譯器爲我做一個骯髒的工作，併產生以上所有情況。有人能指引我朝着正確的方向嗎？

來源

2016-12-20 Amadeusz

你有沒有經過上面的性能測試？我感到驚訝的是，在循環外部移動簡單的布爾測試應該非常重要，因爲處理器通常通過假設「與上次時間相同的結果」來優化分支... –

我承認我沒有 - 我只是假設指令條件會產生性能更好。我讀了關於分支預測（http://igoro.com/archive/fast-and-slow-if-statements-branch-prediction-in-modern-processors/），我想它會適用於我的情況。謝謝！ – Amadeusz

我的意思是，我可能是錯的......我只是考慮在做任何太複雜的事情之前進行性能測試。 –

使用類似這樣的std::function會使您付出代價，因爲您不會讓編譯器通過內聯來優化內存。

你想要做的是模板的好工作。而且，由於您使用整數，表達式本身可能會被優化掉，從而無需編寫每個版本的專業化版本。請看下面的例子：

#include <array> 
#include <chrono> 
#include <iostream> 
#include <random> 
#include <vector> 

template <bool AccumulateR, bool AccumulateG, bool AccumulateB> 
inline int accumulate(const unsigned char *pixel) { 
    static constexpr int enableR = static_cast<int>(AccumulateR); 
    static constexpr int enableG = static_cast<int>(AccumulateG); 
    static constexpr int enableB = static_cast<int>(AccumulateB); 
    return enableR * static_cast<int>(pixel[0]) + 
     enableG * static_cast<int>(pixel[1]) + 
     enableB * static_cast<int>(pixel[2]); 
} 

int main(void) { 
    std::vector<std::array<unsigned char, 3>> pixels(
     1e7, std::array<unsigned char, 3>{0, 0, 0}); 

    // Fill up with randomness 
    std::random_device rd; 
    std::uniform_int_distribution<unsigned char> dist(0, 255); 
    for (auto &pixel : pixels) { 
    pixel[0] = dist(rd); 
    pixel[1] = dist(rd); 
    pixel[2] = dist(rd); 
    } 

    // Measure perf 
    using namespace std::chrono; 

    auto t1 = high_resolution_clock::now(); 
    int sum1 = 0; 
    for (auto const &pixel : pixels) 
    sum1 += accumulate<true, true, true>(pixel.data()); 
    auto t2 = high_resolution_clock::now(); 
    int sum2 = 0; 
    for (auto const &pixel : pixels) 
    sum2 += accumulate<false, true, false>(pixel.data()); 
    auto t3 = high_resolution_clock::now(); 

    std::cout << "Sum 1 " << sum1 << " in " 
      << duration_cast<milliseconds>(t2 - t1).count() << "ms\n"; 
    std::cout << "Sum 2 " << sum2 << " in " 
      << duration_cast<milliseconds>(t3 - t2).count() << "ms\n"; 
}

編譯時鏘3.9與-O2，收益率這個結果我的CPU：

Sum 1 -470682949 in 7ms 
Sum 2 1275037960 in 2ms

請注意一個事實，即我們這裏有一個溢出，則可能需要使用比int更大的東西。 A uint64_t可能會。如果您檢查彙編代碼，您將看到該函數的兩個版本內聯和優化方式不同。

來源

2016-12-20 14:30:01

第一件事是第一件事。不要寫一個std::function，它只需要一個pixel;寫一個需要連續範圍pixel s（像素掃描線）。

其次，你要編寫一個template版本的intensity：

template<bool red, bool green, bool blue> 
int intensity(const unsiged char* pixel){ 
    return (red ? pixel[0] : 0) + (green ? pixel[1] : 0) + (blue ? pixel[2] : 0); 
}

很簡單，不是嗎？這將優化到您的手工製作版本。

template<std::size_t index> 
int intensity(const unsiged char* pixel){ 
    return intensity< index&1, index&2, index&4 >(pixel); 
}

這一個從index比特到該intensity<bool, bool, bool>的調用映射。現在的掃描線的版本：

template<std::size_t index, std::size_t pixel_stride=3> 
int sum_intensity(const unsiged char* pixel, std::size_t count){ 
    int value = 0; 
    while(count--) { 
    value += intensity<index>(pixel); 
    pixel += pixel_stride; 
    } 
    return value; 
}

現在，我們可以生成我們的掃描線強度計算：

int(*)(const unsigned char* pel, std::size_t pixels) 
scanline_intensity(bool red, bool green, bool blue) { 
    static const auto table[] = { 
    sum_intensity<0b000>, sum_intensity<0b001>, 
       sum_intensity<0b010>, sum_intensity<0b011>, 
    sum_intensity<0b100>, sum_intensity<0b101>, 
       sum_intensity<0b110>, sum_intensity<0b111>, 
    }; 
    std::size_t index = red + green*2 + blue*4; 
    return sum_intensity[index]; 
}

和完成。

這些技術可以做成通用的，但你不需要通用的技術。

如果你的像素跨度不是3（比如有一個alpha通道），sum_intensity需要通過它（作爲模板參數）。

來源

2016-12-20 16:43:09 Yakk

在編譯時生成函數

回答

相關問題