僅在NaN去除方法中成功啓用-fno-finite-math-

在運行我的代碼的優化版本（編譯在g++ 4.8.2和4.9.3中）時，發現一個變爲NaN的錯誤，我發現問題是-Ofast選項，特別是它包含的-ffinite-math-only標誌。僅在NaN去除方法中成功啓用-fno-finite-math-

代碼的一部分涉及使用fscanf從FILE*讀取浮動塊，然後用數字值替換所有NaN。然而，如所預料的那樣，-ffinite-math-only開始，並且移除這些檢查，從而留下NaN。

爲了解決這個問題，我偶然發現了this，它建議增加-fno-finite-math-only作爲方法屬性來禁用對特定方法的優化。以下說明的問題，並已嘗試修復（其實際上不修復）：

#include <cstdio> 
#include <cmath> 

__attribute__((optimize("-fno-finite-math-only"))) 
void replaceNaN(float * arr, int size, float newValue){ 
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue; 
} 

int main(void){ 
    const size_t cnt = 10; 
    float val[cnt]; 
    for(int i = 0; i < cnt; i++) scanf("%f", val + i); 
    replaceNaN(val, cnt, -1.0f); 
    for(int i = 0; i < cnt; i++) printf("%f ", val[i]); 
    return 0; 
}

如期望如果編譯使用echo 1 2 3 4 5 6 7 8 nan 10 | (g++ -ffinite-math-only test.cpp -o test && ./test) /運行該代碼不動作，具體而言，它輸出一個nan（其應具有已被替換爲-1.0f） - 如果-ffinite-math-only標誌已被忽略，則表現良好。不應該這樣工作嗎？我是否缺少某些與gcc中的屬性相關的語法，或者這是一個關於「與某個版本的GCC有關的問題」（來自鏈接的SO問題）

幾個解決方案我知道，但寧願東西有點清潔/更便攜：

編譯與-fno-finite-math-only（我interrim解決方案）的代碼：我懷疑這種優化可能是我在程序的其餘方面相當有用;
在輸入流中手動查找字符串"nan"，然後替換那裏的值（輸入讀取器位於庫的不相關部分，產生不良設計以在其中包含此測試）。
假設一個特定的浮點體系結構並製作我自己的isNaN：我可能會這樣做，但它有點冒險和不可移植。
使用不帶-ffinite-math-only標記的單獨編譯的程序預過濾數據，然後將其提供給主程序：維護兩個二進制文件並使它們彼此交談的額外複雜性不值得。

編輯：放於接受的答案，這似乎這是在舊版本的g++編譯器「錯誤」，如4.82和4.9.3，即固定在新版本中，如5.1和6.1.1。

如果出於某種原因更新編譯器不是一個比較簡單的選項（例如：無根訪問），或者將此屬性添加到單個函數中仍然不能完全解決檢查問題NaN，可以確定代碼將始終運行在浮點環境IEEE754中，手動檢查float的位以獲得NaN簽名。

接受的答案建議使用位字段來完成此操作，但是，編譯器將位元素放置在位字段中的順序是非標準的，實際上，舊版本和更新版本之間的更改拒絕遵守舊版本中的理想定位（4.8.2和4.9.3，始終先放置尾數），無論它們在代碼中的顯示順序如何。

但是，使用位操作的解決方案可以保證在所有IEEE754兼容編譯器上都能正常工作。下面是我的這種實現，我最終用它來解決我的問題。它檢查是否符合IEEE754，並且我已經將其擴展爲允許雙打，以及其他更常規的浮點位操作。

#include <limits> // IEEE754 compliance test 
#include <type_traits> // enable_if 

template< 
    typename T, 
    typename = typename std::enable_if<std::is_floating_point<T>::value>::type, 
    typename = typename std::enable_if<std::numeric_limits<T>::is_iec559>::type, 
    typename u_t = typename std::conditional<std::is_same<T, float>::value, uint32_t, uint64_t>::type 
> 
struct IEEE754 { 

    enum class WIDTH : size_t { 
     SIGN = 1, 
     EXPONENT = std::is_same<T, float>::value ? 8 : 11, 
     MANTISSA = std::is_same<T, float>::value ? 23 : 52 
    }; 
    enum class MASK : u_t { 
     SIGN = (u_t)1 << (sizeof(u_t) * 8 - 1), 
     EXPONENT = ((~(u_t)0) << (size_t)WIDTH::MANTISSA)^(u_t)MASK::SIGN, 
     MANTISSA = (~(u_t)0) >> ((size_t)WIDTH::SIGN + (size_t)WIDTH::EXPONENT) 
    }; 
    union { 
     T f; 
     u_t u; 
    }; 

    IEEE754(T f) : f(f) {} 

    inline u_t sign() const { return u & (u_t)MASK::SIGN >> ((size_t)WIDTH::EXPONENT + (size_t)WIDTH::MANTISSA); } 
    inline u_t exponent() const { return u & (u_t)MASK::EXPONENT >> (size_t)WIDTH::MANTISSA; } 
    inline u_t mantissa() const { return u & (u_t)MASK::MANTISSA; } 

    inline bool isNan() const { 
     return (mantissa() != 0) && ((u & ((u_t)MASK::EXPONENT)) == (u_t)MASK::EXPONENT); 
    } 
}; 
template<typename T> 
inline IEEE754<T> toIEEE754(T val) { return IEEE754<T>(val); }

而且replaceNaN函數現在變爲：

void replaceNaN(float * arr, int size, float newValue){ 
    for(int i = 0; i < size; i++) 
     if (toIEEE754(arr[i]).isNan()) arr[i] = newValue; 
}

這些功能的組件的檢查結果顯示，符合市場預期，所有的面具成爲編譯時間常數，導致下面的（貌似）高效的代碼：

# In loop of replaceNaN 
movl (%rcx), %eax  # eax = arr[i] 
testl $8388607, %eax  # Check if mantissa is empty 
je .L3     # If it is, it's not a nan (it's inf), continue loop 
andl $2139095040, %eax # Mask leaves only exponent 
cmpl $2139095040, %eax # Test if exponent is all 1s 
jne .L3     # If it isn't, it's not a nan, so continue loop

這比用一個工作位字段溶液（無偏移），和相同數量的寄存器用於少一個指令（儘管說這樣做會讓它更有效率，但還有其他一些問題，例如流水線問題，這可能會導致一種解決方案比另一種解決方案效率更高或更低）。

來源

2016-07-09 André Harder

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007boptimize_007d-function-attribute-3278「該屬性僅用於調試目的，不適用於生產代碼「。 –

對我來說看起來像一個編譯器錯誤。通過GCC 4.9.2，該屬性被完全忽略。海灣合作委員會5.1和後來注意它。也許是時候升級你的編譯器了？

__attribute__((optimize("-fno-finite-math-only"))) 
void replaceNaN(float * arr, int size, float newValue){ 
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue; 
}

編譯時-ffinite-math-only上GCC 4.9.2：

replaceNaN(float*, int, float): 
     rep ret

但隨着GCC 5.1完全相同的設置：

replaceNaN(float*, int, float): 
     test esi, esi 
     jle  .L26 
     sub  rsp, 8 
     call std::isnan(float) [clone .isra.0] 
     test al, al 
     je  .L2 
     mov  rax, rdi 
     and  eax, 15 
     shr  rax, 2 
     neg  rax 
     and  eax, 3 
     cmp  eax, esi 
     cmova eax, esi 
     cmp  esi, 6 
     jg  .L28 
     mov  eax, esi 
.L5: 
     cmp  eax, 1 
     movss DWORD PTR [rdi], xmm0 
     je  .L16 
     cmp  eax, 2 
     movss DWORD PTR [rdi+4], xmm0 
     je  .L17 
     cmp  eax, 3 
     movss DWORD PTR [rdi+8], xmm0 
     je  .L18 
     cmp  eax, 4 
     movss DWORD PTR [rdi+12], xmm0 
     je  .L19 
     cmp  eax, 5 
     movss DWORD PTR [rdi+16], xmm0 
     je  .L20 
     movss DWORD PTR [rdi+20], xmm0 
     mov  edx, 6 
.L7: 
     cmp  esi, eax 
     je  .L2 
.L6: 
     mov  r9d, esi 
     lea  r8d, [rsi-1] 
     mov  r11d, eax 
     sub  r9d, eax 
     lea  ecx, [r9-4] 
     sub  r8d, eax 
     shr  ecx, 2 
     add  ecx, 1 
     cmp  r8d, 2 
     lea  r10d, [0+rcx*4] 
     jbe  .L9 
     movaps xmm1, xmm0 
     lea  r8, [rdi+r11*4] 
     xor  eax, eax 
     shufps xmm1, xmm1, 0 
.L11: 
     add  eax, 1 
     add  r8, 16 
     movaps XMMWORD PTR [r8-16], xmm1 
     cmp  ecx, eax 
     ja  .L11 
     add  edx, r10d 
     cmp  r9d, r10d 
     je  .L2 
.L9: 
     movsx rax, edx 
     movss DWORD PTR [rdi+rax*4], xmm0 
     lea  eax, [rdx+1] 
     cmp  eax, esi 
     jge  .L2 
     add  edx, 2 
     cdqe 
     cmp  esi, edx 
     movss DWORD PTR [rdi+rax*4], xmm0 
     jle  .L2 
     movsx rdx, edx 
     movss DWORD PTR [rdi+rdx*4], xmm0 
.L2: 
     add  rsp, 8 
.L26: 
     rep ret 
.L28: 
     test eax, eax 
     jne  .L5 
     xor  edx, edx 
     jmp  .L6 
.L20: 
     mov  edx, 5 
     jmp  .L7 
.L19: 
     mov  edx, 4 
     jmp  .L7 
.L18: 
     mov  edx, 3 
     jmp  .L7 
.L17: 
     mov  edx, 2 
     jmp  .L7 
.L16: 
     mov  edx, 1 
     jmp  .L7

輸出是類似的，雖然不完全相同，在GCC 6.1上。

與

#pragma GCC push_options 
#pragma GCC optimize ("-fno-finite-math-only") 
void replaceNaN(float * arr, int size, float newValue){ 
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue; 
} 
#pragma GCC pop_options

更換屬性使絕對沒有什麼區別，所以不能簡單地被忽略的屬性的問題。這些舊版本的編譯器顯然不支持以功能級粒度來控制浮點優化行爲。

注意，然而，在GCC 5.1生成的代碼，後來還是比編譯功能，不無-ffinite-math-only開關顯著更糟：

replaceNaN(float*, int, float): 
     test esi, esi 
     jle  .L1 
     lea  eax, [rsi-1] 
     lea  rax, [rdi+4+rax*4] 
.L5: 
     movss xmm1, DWORD PTR [rdi] 
     ucomiss xmm1, xmm1 
     jnp  .L6 
     movss DWORD PTR [rdi], xmm0 
.L6: 
     add  rdi, 4 
     cmp  rdi, rax 
     jne  .L5 
     rep ret 
.L1: 
     rep ret

我不知道爲什麼會有這樣的差異。有些東西嚴重地將編譯器從其遊戲中拋出;這比完全禁用優化的代碼更糟糕。如果我不得不猜測，我會推測這是std::isnan的實現。如果這種方法對速度不敏感，那麼它可能並不重要。如果您需要重複解析文件中的值，則可能希望具有相當高效的實現。

就我個人而言，我會自己編寫std::isnan的非便攜執行程序。 IEEE 754格式都有很好的文檔記錄，並且假設您對代碼進行了全面測試和評論，但是除非您絕對需要將代碼移植到所有不同的體系結構中，否則我不會看到這種損害。它將推動純粹主義者走上牆壁，但應該使用非標準選項如-ffinite-math-only。對於single-precision float，是這樣的：

bool my_isnan(float value) 
{ 
    union IEEE754_Single 
    { 
    float f; 
    struct 
    { 
    #if BIG_ENDIAN 
     uint32_t sign  : 1; 
     uint32_t exponent : 8; 
     uint32_t mantissa : 23; 
    #else 
     uint32_t mantissa : 23; 
     uint32_t exponent : 8; 
     uint32_t sign  : 1; 
    #endif 
    } bits; 
    } u = { value }; 

    // In the IEEE 754 representation, a float is NaN when 
    // the mantissa is non-zero, and the exponent is all ones 
    // (2^8 - 1 == 255). 
    return (u.bits.mantissa != 0) && (u.bits.exponent == 255); 
}

現在，不需要註解，只使用my_isnan，而不是std::isnan。當編譯時產生以下目標代碼-ffinite-math-only啓用：

replaceNaN(float*, int, float): 
     test esi, esi 
     jle  .L6 
     lea  eax, [rsi-1] 
     lea  rdx, [rdi+4+rax*4] 
.L13: 
     mov  eax, DWORD PTR [rdi]  ; get original floating-point value 
     test eax, 8388607    ; test if mantissa != 0 
     je  .L9 
     shr  eax, 16     ; test if exponent has all bits set 
     and  ax, 32640 
     cmp  ax, 32640 
     jne  .L9 
     movss DWORD PTR [rdi], xmm0 ; set newValue if original was NaN 
.L9: 
     add  rdi, 4 
     cmp  rdx, rdi 
     jne  .L13 
     rep ret 
.L6: 
     rep ret

楠檢查是不是一個簡單的ucomiss其次是奇偶標誌的測試稍微複雜一些，但保證，只要是正確的，因爲你的編譯器遵守IEEE 754標準。這適用於所有版本的GCC和任何其他編譯器。

來源

2016-07-09 08:55:51

非常徹底！然而，我應該注意的是，位域方法並不能保證適用於所有符合IEEE754的編譯器，因爲標準沒有爲工會成員定義特定的順序，實際上也不是，例如在我的'g ++ 4.8.2'和'4.9.3'安裝，它決定將'bits.mantissa'放在字段的開頭（不管它們在代碼中的順序如何）。然而，位域方法在MSVC和'g ++ 6.1.1'中可以正常工作。然而，對於位掩碼也是這樣，所有符合IEEE754的編譯器都應該可以正常工作。解決這個問題，這將是正確的答案！ –

僅在NaN去除方法中成功啓用-fno-finite-math-

回答

相關問題