優化失敗魯棒性措施

我目前正在調查數組求和的穩健方法，並實施了Shewchuk在"Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates"中發佈的算法。雖然實施的算法按照預期在gfortran中工作，但ifort優化了對策。優化失敗魯棒性措施

給予一定的情況下，這裏是我的代碼：

module test_mod 
contains 
    function shewchukSum(array) result(res) 
    implicit none 
    real,intent(in) :: array(:) 
    real   :: res 
    integer   :: xIdx, yIdx, i, nPartials 
    real   :: partials(100), hi, lo, x, y 

    nPartials = 0 
    do xIdx=1,size(array) 
     i = 0 
     x = array(xIdx) 

     ! Calculate the partial sums 
     do yIdx=1,nPartials 
     y = partials(yIdx) 
     hi = x + y 
     if (abs(x) < abs(y)) then 
      lo = x - (hi - y) 
     else 
      lo = y - (hi - x) 
     endif 
     x = hi 

     ! If a round-off error occured, store it. Exact comparison intended 
     if (lo == 0.) cycle 
     i = i + 1 ; partials(i) = lo 
     enddo ! yIdx 
     nPartials = i + 1 ; partials(nPartials) = x 
    enddo ! xIdx 

    res = sum(partials(:nPartials)) 
    end function 
end module

和主叫測試程序

program test 
    use test_mod 
    implicit none 
    print *,  sum([1.e0, 1.e16, 1.e0, -1.e16]) 
    print *,shewchukSum([1.e0, 1.e16, 1.e0, -1.e16]) 
end program

使用gfortran與產生正確的結果爲所有優化級別編譯：

./a.out 
    0.00000000  
    2.00000000

ifort然而，產生零上述-O0所有優化：

./a.out 
    0.00000000 
    0.00000000

我試着調試代碼，並就下到裝配水平，想通了，ifort被優化掉的lo計算和if (lo == 0.) cycle後的操作。

是否有可能強制ifort執行所有級別的優化的完整操作？這個添加是計算的關鍵部分，我希望它儘可能快地運行。爲了比較，gfortran在-O2執行該代碼大約比ifort在-O0（對於長度> 100k的陣列測量）快8-10倍。

來源

2015-12-13 Alexander Vogt

如果有其他編譯選項，您是否使用ifort？ – francescalus

@francescalus對於此測試，沒有其他選項。用於調試'-warn all -check'，用於檢查彙編代碼'-S -fcode-asm -fsource-asm'。 –

可能有一些來自整個網站，所以你可以看看'-dryrun'的輸出，尤其是你是否有'-assume noprotect_parens'？ [如果我使用'-assume protect_parens'，我會得到您期望的答案 - 但也可能會有其他事情發生。]儘管這是默認設置，因此可能不會列出。 – francescalus

當涉及到浮點操作時，ifort的默認值通常用於性能而不是嚴格的正確性。

有許多選項可以控制浮點行爲。使用ifort 16和選項-assume protect_parens即使在更高的優化級別下，我也能得到預期的行爲。

此外，還有-fp-model precise -fp-model source（後者意味着-assume protect_parens這也可能是您感興趣的選項。對於-fp-model默認爲fast=1這

允許值不安全優化

自然，這些可能會影響性能，所以圍繞浮點行爲的其他選項也值得考慮。更多的細節可以在Intel publication找到。

來源

2015-12-13 18:13:39 francescalus

非常感謝！這解決了我的問題。就這個應用程序而言，「ifort」（15.0.0）比我的機器上的「gfortran」（5.2.0）慢兩倍以上（以最快的速度運行並嘗試各種優化選項）。 –

你知道這是否可以指定爲編譯器指令而不是編譯器選項？ –

你是如何構建測試用例的？我以後可能會快速玩一下，因爲這聽起來很有趣。 – francescalus

優化失敗魯棒性措施

回答

相關問題