爲簡單起見,我將重點介紹sincospi()
,它同時提供正弦和餘弦結果。然後可以將sinpi
和cospi
構造爲丟棄不需要的數據的包裝函數。在很多應用程序中,不需要處理浮點標誌(請參閱fenv.h
),大多數情況下我們也不需要使用errno
錯誤報告,所以我將省略這些。
基本的算法結構非常簡單。由於非常大的參數總是偶數,因此也是2π的倍數,所以它們的正弦和餘弦值是衆所周知的。其他參數在記錄象限信息時被摺疊到範圍[-¼,+¼]中。多項式minimax approximations用於計算主近似間隔上的正弦和餘弦。最後,通過週期性交換結果和符號變化,使用象限數據將初步結果映射到最終結果。
特殊操作數(尤其是-0,infinities和NaNs)的正確處理要求編譯器僅應用符合IEEE-754規則的優化。它可能不會將x*0.0
轉換爲0.0
(這對於-0,無窮和NaN不正確),也不會將0.0-x
優化爲-x
,因爲否定是根據IEEE-754的5.5.1節的比特級操作(產生不同的結果對於零和NaN)。大多數編譯器都會提供一個強制使用「安全」轉換的標誌,例如用於英特爾C/C++編譯器的-fp-model=precise
。
一個額外的警告適用於減少參數時使用nearbyint
函數。與rint
類似,此功能根據當前舍入模式指定爲四捨五入。當不使用fenv.h
時,舍入模式默認爲舍入爲「最接近或最接近」。使用時,存在定向舍入模式有效的風險。這可以通過使用round
來解決,它總是提供與當前舍入模式無關的舍入模式「舍入到最近,從零開始」。但是,由於大多數處理器體系結構不支持等價的機器指令,因此該函數往往會變慢。
關於性能的說明:下面的C99代碼在很大程度上依賴於使用fma()
,它實現fused multiply-add操作。在大多數現代硬件體系結構中,這通過相應的硬件指令直接支持。如果情況並非如此,則由於通常FMA仿真速度較慢,代碼可能會經歷顯着的減速。
#include <math.h>
#include <stdint.h>
/* Writes result sine result sin(πa) to the location pointed to by sp
Writes result cosine result cos(πa) to the location pointed to by cp
In extensive testing, no errors > 0.97 ulp were found in either the sine
or cosine results, suggesting the results returned are faithfully rounded.
*/
void my_sincospi (double a, double *sp, double *cp)
{
double c, r, s, t, az;
int64_t i;
az = a * 0.0; // must be evaluated with IEEE-754 semantics
/* for |a| >= 2**53, cospi(a) = 1.0, but cospi(Inf) = NaN */
a = (fabs (a) < 9.0071992547409920e+15) ? a : az; // 0x1.0p53
/* reduce argument to primary approximation interval (-0.25, 0.25) */
r = nearbyint (a + a); // must use IEEE-754 "to nearest" rounding
i = (int64_t)r;
t = fma (-0.5, r, a);
/* compute core approximations */
s = t * t;
/* Approximate cos(pi*x) for x in [-0.25,0.25] */
r = -1.0369917389758117e-4;
r = fma (r, s, 1.9294935641298806e-3);
r = fma (r, s, -2.5806887942825395e-2);
r = fma (r, s, 2.3533063028328211e-1);
r = fma (r, s, -1.3352627688538006e+0);
r = fma (r, s, 4.0587121264167623e+0);
r = fma (r, s, -4.9348022005446790e+0);
c = fma (r, s, 1.0000000000000000e+0);
/* Approximate sin(pi*x) for x in [-0.25,0.25] */
r = 4.6151442520157035e-4;
r = fma (r, s, -7.3700183130883555e-3);
r = fma (r, s, 8.2145868949323936e-2);
r = fma (r, s, -5.9926452893214921e-1);
r = fma (r, s, 2.5501640398732688e+0);
r = fma (r, s, -5.1677127800499516e+0);
s = s * t;
r = r * s;
s = fma (t, 3.1415926535897931e+0, r);
/* map results according to quadrant */
if (i & 2) {
s = 0.0 - s; // must be evaluated with IEEE-754 semantics
c = 0.0 - c; // must be evaluated with IEEE-754 semantics
}
if (i & 1) {
t = 0.0 - s; // must be evaluated with IEEE-754 semantics
s = c;
c = t;
}
/* IEEE-754: sinPi(+n) is +0 and sinPi(-n) is -0 for positive integers n */
if (a == floor (a)) s = az;
*sp = s;
*cp = c;
}
單精度版本基本上只在核心近似方面有所不同。使用窮舉測試可以精確確定誤差範圍。
#include <math.h>
#include <stdint.h>
/* Writes result sine result sin(πa) to the location pointed to by sp
Writes result cosine result cos(πa) to the location pointed to by cp
In exhaustive testing, the maximum error in sine results was 0.96677 ulp,
the maximum error in cosine results was 0.96563 ulp, meaning results are
faithfully rounded.
*/
void my_sincospif (float a, float *sp, float *cp)
{
float az, t, c, r, s;
int32_t i;
az = a * 0.0f; // must be evaluated with IEEE-754 semantics
/* for |a| > 2**24, cospi(a) = 1.0f, but cospi(Inf) = NaN */
a = (fabsf (a) < 0x1.0p24f) ? a : az;
r = nearbyintf (a + a); // must use IEEE-754 "to nearest" rounding
i = (int32_t)r;
t = fmaf (-0.5f, r, a);
/* compute core approximations */
s = t * t;
/* Approximate cos(pi*x) for x in [-0.25,0.25] */
r = 0x1.d9e000p-3f;
r = fmaf (r, s, -0x1.55c400p+0f);
r = fmaf (r, s, 0x1.03c1cep+2f);
r = fmaf (r, s, -0x1.3bd3ccp+2f);
c = fmaf (r, s, 0x1.000000p+0f);
/* Approximate sin(pi*x) for x in [-0.25,0.25] */
r = -0x1.310000p-1f;
r = fmaf (r, s, 0x1.46737ep+1f);
r = fmaf (r, s, -0x1.4abbfep+2f);
r = (t * s) * r;
s = fmaf (t, 0x1.921fb6p+1f, r);
if (i & 2) {
s = 0.0f - s; // must be evaluated with IEEE-754 semantics
c = 0.0f - c; // must be evaluated with IEEE-754 semantics
}
if (i & 1) {
t = 0.0f - s; // must be evaluated with IEEE-754 semantics
s = c;
c = t;
}
/* IEEE-754: sinPi(+n) is +0 and sinPi(-n) is -0 for positive integers n */
if (a == floorf (a)) s = az;
*sp = s;
*cp = c;
}
爲了同時獲得最大的準確性和可移植性,在我看來,暫時改變舍入模式(例如['fenv()'](http://man7.org/linux/man-pages/man3/ fenv.3.html)或'fesetround()')截斷/ round-to-zero是必要的。這樣我們可以使用例如卡漢總和/補償總和,並將高精度係數分爲幾個不同的有限精度因子。其他方法似乎都依賴於特定的硬件(如'fma()',仿真速度非常慢)或實現細節。 –
@NominalAnimal我沒有瞄準最大的便攜性,因爲這不是我需要的東西。我在我的答案中指出了各種潛在的困難點,以幫助那些想要在自己的實施中解決它們的人。至於FMA,它可以作爲近期(大約過去5年)的x86和ARM處理器的硬件指令,當然也可以作爲20世紀90年代以來的Power [PC]。如果有人想提供一個針對無FMA硬件平臺優化的代碼的答案,我很樂意加入它(如果它真的很好,還可以給它額外的獎勵)。 – njuffa