使用標準C數學庫實現sinpi（）和cospi（）

函數sinpi(x)計算sin（πx），函數cospi(x)計算cos（πx），其中與π的乘積隱含在函數內部。這些函數最初作爲Sun Microsystems在late 1980s中的擴展引入到C標準數學庫中。 IEEE Std 754 TM -2008在第9節中規定了等效功能sinPi和cosPi。使用標準C數學庫實現sinpi（）和cospi（）

有許多計算其中sin（πx）和cos（πx）自然出現。一個非常簡單的例子是Box-Muller變換（GEP Box和Mervin E.Muller，「關於隨機正常偏差的產生的註釋」The Annals of Mathematical Statistics，Vol.29，No.2，pp.610 - 611），其中，由於兩個獨立的隨機變量U 1和U 2與均勻分佈，產生獨立的隨機變量Z 1和Z 2與標準正態分佈：

Z₁ = √(-2 ln U₁) cos (2 π U₂) 
Z₂ = √(-2 ln U₁) sin (2 π U₂)

進一步的例子是正弦的用於程度參數計算和餘弦，如使用Haversine公式計算大圓距離所示：

/* This function computes the great-circle distance of two points on earth 
    using the Haversine formula, assuming spherical shape of the planet. A 
    well-known numerical issue with the formula is reduced accuracy in the 
    case of near antipodal points. 

    lat1, lon1 latitude and longitude of first point, in degrees [-90,+90] 
    lat2, lon2 latitude and longitude of second point, in degrees [-180,+180] 
    radius  radius of the earth in user-defined units, e.g. 6378.2 km or 
       3963.2 miles 

    returns: distance of the two points, in the same units as radius 

    Reference: http://en.wikipedia.org/wiki/Great-circle_distance 
*/ 
double haversine (double lat1, double lon1, double lat2, double lon2, double radius) 
{ 
    double dlat, dlon, c1, c2, d1, d2, a, c, t; 

    c1 = cospi (lat1/180.0); 
    c2 = cospi (lat2/180.0); 
    dlat = lat2 - lat1; 
    dlon = lon2 - lon1; 
    d1 = sinpi (dlat/360.0); 
    d2 = sinpi (dlon/360.0); 
    t = d2 * d2 * c1 * c2; 
    a = d1 * d1 + t; 
    c = 2.0 * asin (fmin (1.0, sqrt (a))); 
    return radius * c; 
}

對於C++，Boost庫提供sin_pi和 cos_pi，某些供應商提供sinpi和cospi功能作爲系統庫中的擴展。例如，Apple向iOS 7和OS X 10.9（presentation，幻燈片101）添加了__sinpi,__cospi和相應的單精度版本__sinpif，__cospif。但是對於其他許多平臺，C程序沒有實現任何實現。

與傳統方法相比， sin (M_PI * x)和cos (M_PI * x),sinpi和cospi的使用通過減少經由與π乘以的舍入誤差來提高準確度，並且由於簡化了參數減少而提供了性能優勢。

如何使用標準C數學庫以合理高效且標準的兼容方式實現sinpi()和cospi()功能？

來源

2017-03-14 njuffa

爲了同時獲得最大的準確性和可移植性，在我看來，暫時改變舍入模式（例如['fenv（）']（http://man7.org/linux/man-pages/man3/ fenv.3.html）或'fesetround（）'）截斷/ round-to-zero是必要的。這樣我們可以使用例如卡漢總和/補償總和，並將高精度係數分爲幾個不同的有限精度因子。其他方法似乎都依賴於特定的硬件（如'fma（）'，仿真速度非常慢）或實現細節。 –

@NominalAnimal我沒有瞄準最大的便攜性，因爲這不是我需要的東西。我在我的答案中指出了各種潛在的困難點，以幫助那些想要在自己的實施中解決它們的人。至於FMA，它可以作爲近期（大約過去5年）的x86和ARM處理器的硬件指令，當然也可以作爲20世紀90年代以來的Power [PC]。如果有人想提供一個針對無FMA硬件平臺優化的代碼的答案，我很樂意加入它（如果它真的很好，還可以給它額外的獎勵）。 – njuffa

爲簡單起見，我將重點介紹sincospi()，它同時提供正弦和餘弦結果。然後可以將sinpi和cospi構造爲丟棄不需要的數據的包裝函數。在很多應用程序中，不需要處理浮點標誌（請參閱fenv.h），大多數情況下我們也不需要使用errno錯誤報告，所以我將省略這些。

基本的算法結構非常簡單。由於非常大的參數總是偶數，因此也是2π的倍數，所以它們的正弦和餘弦值是衆所周知的。其他參數在記錄象限信息時被摺疊到範圍[-¼，+¼]中。多項式minimax approximations用於計算主近似間隔上的正弦和餘弦。最後，通過週期性交換結果和符號變化，使用象限數據將初步結果映射到最終結果。

特殊操作數（尤其是-0，infinities和NaNs）的正確處理要求編譯器僅應用符合IEEE-754規則的優化。它可能不會將x*0.0轉換爲0.0（這對於-0，無窮和NaN不正確），也不會將0.0-x優化爲-x，因爲否定是根據IEEE-754的5.5.1節的比特級操作（產生不同的結果對於零和NaN）。大多數編譯器都會提供一個強制使用「安全」轉換的標誌，例如用於英特爾C/C++編譯器的-fp-model=precise。

一個額外的警告適用於減少參數時使用nearbyint函數。與rint類似，此功能根據當前舍入模式指定爲四捨五入。當不使用fenv.h時，舍入模式默認爲舍入爲「最接近或最接近」。使用時，存在定向舍入模式有效的風險。這可以通過使用round來解決，它總是提供與當前舍入模式無關的舍入模式「舍入到最近，從零開始」。但是，由於大多數處理器體系結構不支持等價的機器指令，因此該函數往往會變慢。

關於性能的說明：下面的C99代碼在很大程度上依賴於使用fma()，它實現fused multiply-add操作。在大多數現代硬件體系結構中，這通過相應的硬件指令直接支持。如果情況並非如此，則由於通常FMA仿真速度較慢，代碼可能會經歷顯着的減速。

#include <math.h> 
#include <stdint.h> 

/* Writes result sine result sin(πa) to the location pointed to by sp 
    Writes result cosine result cos(πa) to the location pointed to by cp 

    In extensive testing, no errors > 0.97 ulp were found in either the sine 
    or cosine results, suggesting the results returned are faithfully rounded. 
*/ 
void my_sincospi (double a, double *sp, double *cp) 
{ 
    double c, r, s, t, az; 
    int64_t i; 

    az = a * 0.0; // must be evaluated with IEEE-754 semantics 
    /* for |a| >= 2**53, cospi(a) = 1.0, but cospi(Inf) = NaN */ 
    a = (fabs (a) < 9.0071992547409920e+15) ? a : az; // 0x1.0p53 
    /* reduce argument to primary approximation interval (-0.25, 0.25) */ 
    r = nearbyint (a + a); // must use IEEE-754 "to nearest" rounding 
    i = (int64_t)r; 
    t = fma (-0.5, r, a); 
    /* compute core approximations */ 
    s = t * t; 
    /* Approximate cos(pi*x) for x in [-0.25,0.25] */ 
    r =   -1.0369917389758117e-4; 
    r = fma (r, s, 1.9294935641298806e-3); 
    r = fma (r, s, -2.5806887942825395e-2); 
    r = fma (r, s, 2.3533063028328211e-1); 
    r = fma (r, s, -1.3352627688538006e+0); 
    r = fma (r, s, 4.0587121264167623e+0); 
    r = fma (r, s, -4.9348022005446790e+0); 
    c = fma (r, s, 1.0000000000000000e+0); 
    /* Approximate sin(pi*x) for x in [-0.25,0.25] */ 
    r =    4.6151442520157035e-4; 
    r = fma (r, s, -7.3700183130883555e-3); 
    r = fma (r, s, 8.2145868949323936e-2); 
    r = fma (r, s, -5.9926452893214921e-1); 
    r = fma (r, s, 2.5501640398732688e+0); 
    r = fma (r, s, -5.1677127800499516e+0); 
    s = s * t; 
    r = r * s; 
    s = fma (t, 3.1415926535897931e+0, r); 
    /* map results according to quadrant */ 
    if (i & 2) { 
     s = 0.0 - s; // must be evaluated with IEEE-754 semantics 
     c = 0.0 - c; // must be evaluated with IEEE-754 semantics 
    } 
    if (i & 1) { 
     t = 0.0 - s; // must be evaluated with IEEE-754 semantics 
     s = c; 
     c = t; 
    } 
    /* IEEE-754: sinPi(+n) is +0 and sinPi(-n) is -0 for positive integers n */ 
    if (a == floor (a)) s = az; 
    *sp = s; 
    *cp = c; 
}

單精度版本基本上只在核心近似方面有所不同。使用窮舉測試可以精確確定誤差範圍。

#include <math.h> 
#include <stdint.h> 

/* Writes result sine result sin(πa) to the location pointed to by sp 
    Writes result cosine result cos(πa) to the location pointed to by cp 

    In exhaustive testing, the maximum error in sine results was 0.96677 ulp, 
    the maximum error in cosine results was 0.96563 ulp, meaning results are 
    faithfully rounded. 
*/ 
void my_sincospif (float a, float *sp, float *cp) 
{ 
    float az, t, c, r, s; 
    int32_t i; 

    az = a * 0.0f; // must be evaluated with IEEE-754 semantics 
    /* for |a| > 2**24, cospi(a) = 1.0f, but cospi(Inf) = NaN */ 
    a = (fabsf (a) < 0x1.0p24f) ? a : az; 
    r = nearbyintf (a + a); // must use IEEE-754 "to nearest" rounding 
    i = (int32_t)r; 
    t = fmaf (-0.5f, r, a); 
    /* compute core approximations */ 
    s = t * t; 
    /* Approximate cos(pi*x) for x in [-0.25,0.25] */ 
    r =    0x1.d9e000p-3f; 
    r = fmaf (r, s, -0x1.55c400p+0f); 
    r = fmaf (r, s, 0x1.03c1cep+2f); 
    r = fmaf (r, s, -0x1.3bd3ccp+2f); 
    c = fmaf (r, s, 0x1.000000p+0f); 
    /* Approximate sin(pi*x) for x in [-0.25,0.25] */ 
    r =    -0x1.310000p-1f; 
    r = fmaf (r, s, 0x1.46737ep+1f); 
    r = fmaf (r, s, -0x1.4abbfep+2f); 
    r = (t * s) * r; 
    s = fmaf (t, 0x1.921fb6p+1f, r); 
    if (i & 2) { 
     s = 0.0f - s; // must be evaluated with IEEE-754 semantics 
     c = 0.0f - c; // must be evaluated with IEEE-754 semantics 
    } 
    if (i & 1) { 
     t = 0.0f - s; // must be evaluated with IEEE-754 semantics 
     s = c; 
     c = t; 
    } 
    /* IEEE-754: sinPi(+n) is +0 and sinPi(-n) is -0 for positive integers n */ 
    if (a == floorf (a)) s = az; 
    *sp = s; 
    *cp = c; 
}

來源

2017-03-14 17:39:39 njuffa

如果您明確依賴IEEE 754語義，那麼您如何解決C標準不要求實現的浮點表示或算術符合IEEE 754（完全）的事實？ –

類似地，就四捨五入而言，round to-nearest-or-even是* IEEE 754 *指定的默認舍入模式，而不是C. C甚至沒有在其標準的預定舍入模式宏之間區分不同的舍入到最近的模式。 –

@JohnBollinger我不知道。 *如果*工具鏈提供了符合IEEE-754規則的浮點格式和轉換的充分控制，*然後*此代碼就IEEE-754正確工作（我最好測試它）。相反，如果一個工具鏈通常不符合IEEE-754標準，那麼這個代碼應該有*無*預期（也沒有必要），以符合IEEE-754的所有要求。 – njuffa

使用標準C數學庫實現sinpi（）和cospi（）

回答

相關問題