2012-02-24 282 views
16

我想實現範圍縮減作爲實現正弦函數的第一步。範圍縮減精度不高的單精度浮點數

我下面的論文中描述的方法"ARGUMENT REDUCTION FOR HUGE ARGUMENTS" by K.C. NG

我使用x的輸入範圍從0到20000。我的錯誤,當收到錯誤一樣大0.002339146顯然不應該是大的,並且我我不知道如何減少它。我注意到誤差幅度與輸入θ幅度與餘弦/正弦相關。

我能夠獲得紙張提到nearpi.c代碼,但我不知道如何使用的單精度浮點代碼。如果有人有興趣,在nearpi.c文件可以在此鏈接中找到:nearpi.c

這裏是我的MATLAB代碼:

x = 0:0.1:20000; 

% Perform range reduction 
% Store constant 2/pi 
twooverpi = single(2/pi); 

% Compute y 
y = (x.*twooverpi); 

% Compute k (round to nearest integer 
k = round(y); 

% Solve for f 
f = single(y-k); 

% Solve for r 
r = single(f*single(pi/2)); 

% Find last two bits of k 
n = bitand(fi(k,1,32,0),fi(3,1,32,0)); 
n = single(n); 

% Preallocate for speed 
z(length(x)) = 0; 
for i = 1:length(x) 

    switch(n(i)) 
     case 0 
      z(i)=sin(r(i)); 
     case 1 
      z(i) = single(cos(r(i))); 
     case 2 
      z(i) = -sin(r(i)); 
     case 3 
      z(i) = single(-cos(r(i))); 
     otherwise 
    end 

end 

maxerror = max(abs(single(z - single(sin(single(x)))))) 
minerror = min(abs(single(z - single(sin(single(x)))))) 

我已編輯的程序nearpi.c所以它編譯。但是我不知道如何解釋輸出。此外,文件需要一個輸入,我不得不手動輸入,我也不確定輸入的重要性。

這裏是工作nearpi.c:

/* 
============================================================================ 
Name  : nearpi.c 
Author  : 
Version  : 
Copyright : Your copyright notice 
Description : Hello World in C, Ansi-style 
============================================================================ 
*/ 

#include <stdio.h> 
#include <stdlib.h> 
#include <math.h> 


/* 
* Global macro definitions. 
*/ 

# define hex(double) *(1 + ((long *) &double)), *((long *) &double) 
# define sgn(a)   (a >= 0 ? 1 : -1) 
# define MAX_k   2500 
# define D    56 
# define MAX_EXP  127 
# define THRESHOLD  2.22e-16 

/* 
* Global Variables 
*/ 

int  CFlength,    /* length of CF including terminator */ 
     binade; 
double e, 
     f;      /* [e,f] range of D-bit unsigned int of f; 
            form 1X...X */ 

// Function Prototypes 
int dbleCF (double i[], double j[]); 
void input (double i[]); 
void nearPiOver2 (double i[]); 


/* 
* This is the start of the main program. 
*/ 

int main (void) 
{ 
    int  k;     /* subscript variable */ 
    double i[MAX_k], 
      j[MAX_k];   /* i and j are continued fractions 
            (coeffs) */ 


    // fp = fopen("/src/cfpi.txt", "r"); 


/* 
* Compute global variables e and f, where 
* 
*  e = 2^(D-1), i.e. the D bit number 10...0 
* and 
*  f = 2^D - 1, i.e. the D bit number 11...1 . 
*/ 

    e = 1; 
    for (k = 2; k <= D; k = k + 1) 
     e = 2 * e; 
    f = 2 * e - 1; 

/* 
    * Compute the continued fraction for (2/e)/(pi/2) , i.e. 
    * q's starting value for the first binade, given the continued 
    * fraction for pi as input; set the global variable CFlength 
    * to the length of the resulting continued fraction (including 
    * its negative valued terminator). One should use as many 
    * partial coefficients of pi as necessary to resolve numbers 
    * of the width of the underflow plus the overflow threshold. 
    * A rule of thumb is 0.97 partial coefficients are generated 
    * for every decimal digit of pi . 
    * 
    * Note: for radix B machines, subroutine input should compute 
    * the continued fraction for (B/e)/(pi/2) where e = B^(D - 1). 
    */ 

    input (i); 

/* 
* Begin main loop over all binades: 
* For each binade, find the nearest multiples of pi/2 in that binade. 
* 
* [ Note: for hexadecimal machines (B = 16), the rest of the main 
* program simplifies(!) to 
* 
*      B_ade = 1; 
*      while (B_ade < MAX_EXP) 
*      { 
*       dbleCF (i, j); 
*       dbleCF (j, i); 
*       dbleCF (i, j); 
*       CFlength = dbleCF (j, i); 
*       B_ade = B_ade + 1; 
*      } 
*     } 
* 
* because the alternation of source & destination are no longer necessary. ] 
*/ 

    binade = 1; 
    while (binade < MAX_EXP) 
    { 

/* 
* For the current (odd) binade, find the nearest multiples of pi/2. 
*/ 

     nearPiOver2 (i); 

/* 
* Double the continued fraction to get to the next (even) binade. 
* To save copying arrays, i and j will alternate as the source 
* and destination for the continued fractions. 
*/ 

     CFlength = dbleCF (i, j); 
     binade = binade + 1; 

/* 
* Check for main loop termination again because of the 
* alternation. 
*/ 

     if (binade >= MAX_EXP) 
      break; 

/* 
* For the current (even) binade, find the nearest multiples of pi/2. 
*/ 

     nearPiOver2 (j); 

/* 
* Double the continued fraction to get to the next (odd) binade. 
*/ 

     CFlength = dbleCF (j, i); 
     binade = binade + 1; 
    } 

    return 0; 
}        /* end of Main Program */ 

/* 
* Subroutine DbleCF doubles a continued fraction whose partial 
* coefficients are i[] into a continued fraction j[], where both 
* arrays are of a type sufficient to do D-bit integer arithmetic. 
* 
* In my case (D = 56) , I am forced to treat integers as double 
* precision reals because my machine does not have integers of 
* sufficient width to handle D-bit integer arithmetic. 
* 
* Adapted from a Basic program written by W. Kahan. 
* 
* Algorithm based on Hurwitz's method of doubling continued 
* fractions (see Knuth Vol. 3, p.360). 
* 
* A negative value terminates the last partial quotient. 
* 
* Note: for the non-C programmers, the statement break 
* exits a loop and the statement continue skips to the next 
* case in the same loop. 
* 
* The call modf (l/2, &l0) assigns the integer portion of 
* half of L to L0. 
*/ 

int dbleCF (double i[], double j[]) 
{ 
     double k, 
        l, 
        l0, 
        j0; 
     int n, 
        m; 
    n = 1; 
    m = 0; 
    j0 = i[0] + i[0]; 
    l = i[n]; 
    while (1) 
    { 
     if (l < 0) 
     { 
      j[m] = j0; 
      break; 
     }; 
     modf (l/2, &l0); 
     l = l - l0 - l0; 
     k = i[n + 1]; 
     if (l0 > 0) 
     { 
      j[m] = j0; 
      j[m + 1] = l0; 
      j0 = 0; 
      m = m + 2; 
     }; 
     if (l == 0) { 
/* 
* Even case. 
*/ 
      if (k < 0) 
      { 
       m = m - 1; 
       break; 
      } 
      else 
      { 
       j0 = j0 + k + k; 
       n = n + 2; 
       l = i[n]; 
       continue; 
      }; 
     } 
/* 
* Odd case. 
*/ 
     if (k < 0) 
     { 
      j[m] = j0 + 2; 
      break; 
     }; 
     if (k == 0) 
     { 
      n = n + 2; 
      l = l + i[n]; 
      continue; 
     }; 
     j[m] = j0 + 1; 
     m = m + 1; 
     j0 = 1; 
     l = k - 1; 
     n = n + 1; 
     continue; 
    }; 
    m = m + 1; 
    j[m] = -99999; 
    return (m); 
} 

/* 
* Subroutine input computes the continued fraction for 
* (2/e)/(pi/2) , where e = 2^(D-1) , given pi 's 
* continued fraction as input. That is, double the continued 
* fraction of pi D-3 times and place a zero at the front. 
* 
* One should use as many partial coefficients of pi as 
* necessary to resolve numbers of the width of the underflow 
* plus the overflow threshold. A rule of thumb is 0.97 
* partial coefficients are generated for every decimal digit 
* of pi . The last coefficient of pi is terminated by a 
* negative number. 
* 
* I'll be happy to supply anyone with the partial coefficients 
* of pi . My ARPA address is [email protected] . 
* 
* I computed the partial coefficients of pi using a method of 
* Bill Gosper's. I need only compute with integers, albeit 
* large ones. After writing the program in bc and Vaxima , 
* Prof. Fateman suggested FranzLisp . To my surprise, FranzLisp 
* ran the fastest! the reason? FranzLisp's Bignum package is 
* hand coded in assembler. Also, FranzLisp can be compiled. 
* 
* 
* Note: for radix B machines, subroutine input should compute 
* the continued fraction for (B/e)/(pi/2) where e = B^(D - 1). 
* In the case of hexadecimal (B = 16), this is done by repeated 
* doubling the appropriate number of times. 
*/ 

void input (double i[]) 
{ 
    int  k; 
    double j[MAX_k]; 

/* 
* Read in the partial coefficients of pi from a precalculated file 
* until a negative value is encountered. 
*/ 

    k = -1; 
    do 
    { 
     k = k + 1; 
     scanf ("%lE", &i[k]); 
     printf("hello\n"); 
     printf("%d", k); 
    } while (i[k] >= 0); 

/* 
* Double the continued fraction for pi D-3 times using 
* i and j alternately as source and destination. On my 
* machine D = 56 so D-3 is odd; hence the following code: 
* 
* Double twice (D-3)/2 times, 
*/ 
    for (k = 1; k <= (D - 3)/2; k = k + 1) 
    { 
     dbleCF (i, j); 
     dbleCF (j, i); 
    }; 
/* 
* then double once more. 
*/ 
    dbleCF (i, j); 

/* 
* Now append a zero on the front (reciprocate the continued 
* fraction) and the return the coefficients in i . 
*/ 

    i[0] = 0; 
    k = -1; 
    do 
    { 
     k = k + 1; 
     i[k + 1] = j[k]; 
    } while (j[k] >= 0); 

/* 
* Return the length of the continued fraction, including its 
* terminator and initial zero, in the global variable CFlength. 
*/ 

    CFlength = k; 
} 

/* 
* Given a continued fraction's coefficients in an array i , 
* subroutine nearPiOver2 finds all machine representable 
* values near a integer multiple of pi/2 in the current binade. 
*/ 

void nearPiOver2 (double i[]) 
{ 
    int  k,     /* subscript for recurrences (see 
            handout) */ 
      K;     /* like k , but used during cancel. elim. 
            */ 
    double p[MAX_k],   /* product of the q's   (see 
            handout) */ 
      q[MAX_k],   /* successive tail evals of CF (see 
            handout) */ 
      j[MAX_k],   /* like convergent numerators (see 
            handout) */ 
      tmp,    /* temporary used during cancellation 
            elim. */ 
      mk0,    /* m[k - 1]      (see 
            handout) */ 
      mk,     /* m[k] is one of the few ints (see 
            handout) */ 
      mkAbs,    /* absolute value of m sub k 
           */ 
      mK0,    /* like mk0 , but used during cancel. 
            elim. */ 
      mK,     /* like mk , but used during cancel. 
            elim. */ 
      z,     /* the object of our quest (the argument) 
           */ 
      m0,     /* the mantissa of z as a D-bit integer 
           */ 
      x,     /* the reduced argument   (see 
            handout) */ 
      ldexp(),   /* sys routine to multiply by a power of 
            two */ 
      fabs(),   /* sys routine to compute FP absolute 
            value */ 
      floor(),   /* sys routine to compute greatest int <= 
            value */ 
      ceil();   /* sys routine to compute least int >= 
            value */ 

/* 
    * Compute the q's by evaluating the continued fraction from 
    * bottom up. 
    * 
    * Start evaluation with a big number in the terminator position. 
    */ 

    q[CFlength] = 1.0 + 30; 

    for (k = CFlength - 1; k >= 0; k = k - 1) 
     q[k] = i[k] + 1/q[k + 1]; 

/* 
* Let THRESHOLD be the biggest | x | that we are interesed in 
* seeing. 
* 
* Compute the p's and j's by the recurrences from the top down. 
* 
* Stop when 
* 
*  1       1 
*  ----- >= THRESHOLD > ------ . 
*  2 |j |      2 |j | 
*   k       k+1 
*/ 

    p[0] = 1; 
    j[0] = 0; 
    j[1] = 1; 
    k = 0; 
    do 
    { 
     p[k + 1] = -q[k + 1] * p[k]; 
     if (k > 0) 
      j[1 + k] = j[k - 1] - i[k] * j[k]; 
     k = k + 1; 
    } while (1/(2 * fabs (j[k])) >= THRESHOLD); 

/* 
* Then mk runs through the integers between 
* 
*     k  +     k  + 
*    (-1) e/p - 1/2  & (-1) f/p - 1/2 . 
*       k       k 
*/ 

    for (mkAbs = floor (e/fabs (p[k])); 
      mkAbs <= ceil (f/fabs (p[k])); mkAbs = mkAbs + 1) 
    { 

     mk = mkAbs * sgn (p[k]); 

/* 
* For each mk , mk0 runs through integers between 
* 
*     + 
*    m q - p THRESHOLD . 
*    k k  k 
*/ 

     for (mk0 = floor (mk * q[k] - fabs (p[k]) * THRESHOLD); 
       mk0 <= ceil (mk * q[k] + fabs (p[k]) * THRESHOLD); 
       mk0 = mk0 + 1) 
     { 

/* 
* For each pair { mk , mk0 } , check that 
* 
*        k 
*    m  = (-1) (j m - j m ) 
*    0     k-1 k k k-1 
*/ 
      m0 = (k & 1 ? -1 : 1) * (j[k - 1] * mk - j[k] * mk0); 

/* 
* lies between e and f . 
*/ 
      if (e <= fabs (m0) && fabs (m0) <= f) 
      { 

/* 
* If so, then we have found an 
* 
*        k 
*    x  = ((-1) m/p - m)/j 
*         0 k k  k 
* 
*      = (m q - m )/p . 
*       k k k-1  k 
* 
* But this later formula can suffer cancellation. Therefore, 
* run the recurrence for the mk 's to get mK with minimal 
* | mK | + | mK0 | in the hope mK is 0 . 
*/ 
       K = k; 
       mK = mk; 
       mK0 = mk0; 
       while (fabs (mK) > 0) 
       { 
        p[K + 1] = -q[K + 1] * p[K]; 
        tmp = mK0 - i[K] * mK; 
        if (fabs (tmp) > fabs (mK0)) 
         break; 
        mK0 = mK; 
        mK = tmp; 
        K = K + 1; 
       }; 

/* 
* Then 
*    x  = (m q - m )/p 
*       K K K-1  K 
* 
* as accurately as one could hope. 
*/ 
       x = (mK * q[K] - mK0)/p[K]; 

/* 
* To return z and m0 as positive numbers, 
* x must take the sign of m0 . 
*/ 
       x = x * sgn (m0); 
       m0 = fabs (m0); 

/*d 
* Set z = m0 * 2^(binade+1-D) . 
*/ 
       z = ldexp (m0, binade + 1 - D); 

/* 
* Print z (hex), z (dec), m0 (dec), binade+1-D, x (hex), x (dec). 
*/ 

       printf ("%08lx %08lx Z=%22.16E M=%17.17G L+1-%d=%3d %08lx %08lx x=%23.16E\n", hex (z), z, m0, D, binade + 1 - D, hex (x), x); 

      } 
     } 
    } 
} 
+0

對於trig函數,IIRC準確的範圍縮小對於pi和其餘部分都需要很高的精度;通常在數學庫中使用數百位。 – janneb 2012-02-24 09:53:17

+3

+1 - 有趣的文章(只是有時間瀏覽,稍後會仔細閱讀) – 2012-02-28 00:55:18

+0

@starbox:我讀過,如果之前,是的。我不熟悉用matlab做高精度算術運算,所以我不會建議你如何去做。 – janneb 2012-02-29 21:11:34

回答

9

理論

首先,讓我們用單精度運算,使注意區別。

  1. [等式8] f的最小值可以更大。作爲雙精度數是一超集中的單精度數的,最接近single到的2/pi倍數只能更遠然後〜2.98e-19,因此前導零的的f固定算術表示的數量必須在大多數61個前導零(但可能會更少)。表示數量爲fdigits

  2. [等式之前9]因此,代替121位,y必須精確到fdigits + 24(非零顯著在單精度位)+ 7(額外保護位)= fdigits + 31,並且在最92.

  3. [等式9]「,因此,與x的指數的寬度一起,2/pi必須包含127(的single最大指數)+ 31 + fdigits,或158 + fdigits和至多219個比特。

  4. [Subsec和灰2.5]的A大小由零的數目在x二進制小數點之前確定(並且不受移動到single),而C大小由公式確定之前9.

    • 對於大xx> = 2^24),x看起來像這樣:[24位,M零。將其乘以A(其大小是2/pi的第一個M位)將導致整數(零的x將只將所有內容都轉換爲整數)。
    • 選擇C將從M+d2/pi位開始將導致產品x*C的尺寸最大爲d-24。以雙精度,d被選擇爲174(而不是24,我們有53),以便產品的尺寸最大爲121.在single中,選擇d使得d-24 <= 92或更確切地說d-24 <= fdigits+31 。也就是說,d可以選擇爲fdigits +55或至多116.
    • 結果,B應該具有至多116位的大小。

我們因此留下了兩個問題:

  1. 計算fdigits。這涉及閱讀鏈接文件中的參考文獻6並理解它。可能不那麼容易。 :)據我所知,這是唯一使用nearpi.c的地方。
  2. 計算B,相關位2/pi。由於M的範圍小於127,所以我們只需計算2/pi的前127 + 116位,並將它們存儲在一個數組中。見Wikipedia
  3. 計算y=x*B。這包括將x乘以116位數字。這是使用第3節的地方。塊的大小被選爲24,因爲2 * 24 + 2(乘以兩個24位數,並且加上3個這樣的數)小於double,53(因爲24除96)的精度。對於single算術,我們可以使用大小爲11位的塊,原因相似。

注 - B的技巧僅適用於指數爲正數(x> = 2^24)的數字。

總結 - 首先,您必須用double精度解決問題。你的Matlab代碼也不能在double的精確度下工作(請嘗試刪除single和計算sin(2^53),因爲你的twooverpi只有53個有效位,而不是175(無論如何,你不能直接在Matlab中乘這樣精確的數字)。方案應該適應與single一起工作,並且關鍵問題是精確地代表2/pi,並支持高精度數字的乘法。最後,當一切正常時,您可以嘗試找出更好的fdigits以減少位你要存儲和繁殖

希望我沒有完全關閉 - 意見和矛盾,歡迎

作爲一個例子,讓我們計算sin(x)其中x = single(2^24-1),其具有顯著位(M = 0)後沒有零點。這簡化了查找B,因爲B2/pi的前116位組成。由於x具有24位和116位B,產品精度

y = x * B 

將具有精度92位,如必需的。

鏈接論文中的第3部分描述瞭如何以足夠的精度執行此產品;在我們的例子中,相同的算法可以用於大小爲11的塊來計算y。作爲苦差事,我希望我可以原諒,因爲沒有明確地這樣做,而是依靠Matlab的符號數學工具箱。這個工具箱爲我們提供了vpa函數,它允許我們用十進制數字來指定一個數字的精度。所以,

vpa('2/pi', ceil(116*log10(2))) 

將產生的精度至少爲116個比特的2/pi的近似。因爲vpa只接受整數作爲其精度參數,所以我們通常不能精確地指定數字的二進制精度,因此我們使用次最佳值。

下面的代碼計算sin(x)根據紙張,在single精度:

x = single(2^24-1); 
y = x * vpa('2/pi', ceil(116*log10(2))); % Precision = 103.075 
k = round(y); 
f = single(y - k); 
r = f * single(pi)/2; 
switch mod(k, 4) 
    case 0 
     s = sin(r); 
    case 1 
     s = cos(r); 
    case 2 
     s = -sin(r); 
    case 3 
     s = -cos(r); 
end 
sin(x) - s         % Expected value: exactly zero. 

(使用Mathematica獲得的y精度,其結果是比Matlab一個更好的數值工具:))

libm

對方回答這個問題(因爲它已被刪除),使我的implem在libm,雖然工作在雙精度數字,但非常徹底地遵循鏈接的文件。

參見文件s_sin.c爲包裝(從鏈接的紙張表2顯示爲在該文件的末尾switch語句),並e_rem_pio2.c對於參數減少代碼(特別感興趣的是包含第一個396六角形陣列數字2/pi,從第69行開始)。

+0

我明白有問題需要解決。但是我仍然沒有根據你寫的內容看到解決方案。你能填好你留下的空白嗎?感謝您的時間。 – Veridian 2012-03-03 21:44:44

+0

增加了一些說明。 – user1071136 2012-03-04 21:26:18

+0

我的輸入可能總是在+/- 20000範圍內(小輸入也會包括在內),但是我的輸入x總是遠遠小於2^24。 – Veridian 2012-03-05 19:06:50