C++優化

-3

我正在做一些實時的東西，我需要很多速度。但在我的代碼，我有這樣的：C++優化

float maxdepth; 
uint32_t faceindex; 

for (uint32_t tr_iterator = 0; tr_iterator < facesNum-1; tr_iterator++) 
{ 
    maxdepth = VXTrisDepth[tr_iterator]; 
    faceindex = tr_iterator; 
    uint32_t tr_literator = 3*tr_iterator; 
    uint32_t facelindex = 3*faceindex; 
    for (uint32_t tr_titerator = tr_iterator+1; tr_titerator < facesNum; tr_titerator++) 
    { 
     float depth = VXTrisDepth[tr_titerator]; 
     if (depth > maxdepth) 
     { 
      maxdepth = depth; 
      faceindex = tr_titerator; 
     } 
    } 
    Vei2 itmpx = trs[tr_literator+0]; 
    trs[tr_literator+0] = trs[facelindex+0]; 
    trs[facelindex+0] = itmpx; 
     itmpx = trs[tr_literator+1]; 
    trs[tr_literator+1] = trs[facelindex+1]; 
    trs[facelindex+1] = itmpx; 
     itmpx = trs[tr_literator+2]; 
    trs[tr_literator+2] = trs[facelindex+2]; 
    trs[facelindex+2] = itmpx; 
    float id = VXTrisDepth[tr_iterator]; 
    VXTrisDepth[tr_iterator] = VXTrisDepth[faceindex]; 
    VXTrisDepth[faceindex] = id; 
}

VXTrisDepth只是浮動的數組，faceindex是一個uint32_t的，是一個很大的數字，TRS是Vei2的數組，Vei2僅僅是一個整數二維矢量。問題是，當我們在facenum中有類似16074的東西時，這個循環需要700毫秒才能在我的計算機上運行，而且這太方便了，有沒有優化的想法？

來源

2017-10-17 MMBC Productions

你嘗試過'-O3'開關嗎？ –

嘗試在你有tmp變量的地方使用std :: swap – JLev

可能的優化是將第二個循環移出第一個循環，「2nd」循環爲每個tr_titerator構建一個maxdepth和faceindex矢量， 1st循環使用它來代替。 – megabyte1024

我已經重寫了一下，找出你真的在做什麼。

警告所有代碼是未經測試

float maxdepth; 
uint32_t faceindex; 

for (uint32_t tr_iterator = 0; tr_iterator < facesNum-1; tr_iterator++) { 
    faceindex = tr_iterator; 
    uint32_t tr_literator = 3*tr_iterator; 
    uint32_t facelindex = 3*faceindex; 

    auto fi = std::max_element(&VXTrisDepth[tr_iterator], &VXTrisDepth[facesNum]); 
    maxdepth = *fi; 
    faceindex = std::distance(&VXTrisDepth[0], fi); 

    // hmm was this originally a VEC3... 
    std::swap(trs[tr_literator+0], trs[facelindex+0]); 
    std::swap(trs[tr_literator+1], trs[facelindex+1]); 
    std::swap(trs[tr_literator+2], trs[facelindex+2]); 

    // with the above this looks like a struct of arrays. SOA vs AOS 
    std::swap(VXTrisDepth[tr_iterator], VXTrisDepth[faceindex]); 
}

現在看起來兩個陣列的selection sort這是O（N^2）難怪感覺遲鈍。

有多種方法來解決這

外部索引，使與長度facesNum陣列，從零到initalized facesNum-1以及使用該索引VXTrisDepth對其進行排序。然後根據索引數組重新排列2個原始數組。
外部索引和鍵對，使它易於使用std :: pair，對它進行排序，然後重新排序原始2個數組。
對2個數組進行排序，就好像它是一個，輕微的破解。使用std :: swap你需要專注於一個類型，所以它可能被誤用來交換2個數組。沒有額外的存儲需要。

讓我們嘗試一個簡單的版本與外部對。

我們需要3個階段

化妝輔助陣列O（N）
排序輔助陣列O（N LG N）
訂貨原來陣列O（N）

而且一些更多的代碼

// make helper array 
using hPair = std::pair<float, int>; // order is important 
std::vector<hPair> helper; 
helper.reserve(numFaces); 

for (int idx = 0; idx < facesNum; idx++) 
    helper.emplace_back(VXTrisDepth[idx], idx); 

// sort it using std::pair's operator < or write your own 
std::sort(helper.begin(), helper.end()); 

// reorder the SOA arrays 
auto vx = std::begin(VXTrisDepth); 
for (auto& help : helper) { 
    int tr_literator = help.second; 
    std::swap(trs[tr_literator+0], trs[facelindex+0]); 
    std::swap(trs[tr_literator+1], trs[facelindex+1]); 
    std::swap(trs[tr_literator+2], trs[facelindex+2]); 

    *vs++ = help.first; // we already have the sorted depth in helper. 
    //std::swap(VXTrisDepth[tr_iterator], VXTrisDepth[faceindex]); 
}

記得測試th在它仍然有效...你已經有一個測試框架的權利？

來源

2017-10-21 13:03:44 Surt

回答

相關問題