如何提高小數值的定點平方根

我正在使用Dobb博士的文章「Optimizing Math-Intensive Applications with Fixed-Point Arithmetic」中描述的Anthony Williams的定點庫來計算使用Rhumb Line method的兩個地理點之間的距離。如何提高小數值的定點平方根

當點之間的距離顯着（大於幾公里）時，這種方法的效果不錯，但在較小的距離上很差。最差的情況是當兩點相等或接近相等時，結果是194米的距離，而當距離> = 1米時，我需要至少1米的精度。

由具有雙精度浮點實現中，我已經找到問題的fixed::sqrt()功能，其在小的值表現不佳的比較：

x  std::sqrt(x) fixed::sqrt(x) error 
---------------------------------------------------- 
0  0    3.05176e-005 3.05176e-005 
1e-005 0.00316228  0.00316334  1.06005e-006 
2e-005 0.00447214  0.00447226  1.19752e-007 
3e-005 0.00547723  0.0054779  6.72248e-007 
4e-005 0.00632456  0.00632477  2.12746e-007 
5e-005 0.00707107  0.0070715  4.27244e-007 
6e-005 0.00774597  0.0077467  7.2978e-007 
7e-005 0.0083666  0.00836658  1.54875e-008 
8e-005 0.00894427  0.00894427  1.085e-009

校正結果爲fixed::sqrt(0)是微不足道通過將其視爲一個特殊的情況，但這不會解決小非零距離的問題，其中誤差開始於194米，隨着距離的增加收斂到零。我可能至少需要一個精度趨近於零的改進順序。

fixed::sqrt()算法在上面鏈接的文章的第4頁作了簡要解釋，但我很努力地跟隨它，更不用說確定是否有可能改進它。該函數的代碼錄如下：

fixed fixed::sqrt() const 
{ 
    unsigned const max_shift=62; 
    uint64_t a_squared=1LL<<max_shift; 
    unsigned b_shift=(max_shift+fixed_resolution_shift)/2; 
    uint64_t a=1LL<<b_shift; 

    uint64_t x=m_nVal; 

    while(b_shift && a_squared>x) 
    { 
     a>>=1; 
     a_squared>>=2; 
     --b_shift; 
    } 

    uint64_t remainder=x-a_squared; 
    --b_shift; 

    while(remainder && b_shift) 
    { 
     uint64_t b_squared=1LL<<(2*b_shift-fixed_resolution_shift); 
     int const two_a_b_shift=b_shift+1-fixed_resolution_shift; 
     uint64_t two_a_b=(two_a_b_shift>0)?(a<<two_a_b_shift):(a>>-two_a_b_shift); 

     while(b_shift && remainder<(b_squared+two_a_b)) 
     { 
      b_squared>>=2; 
      two_a_b>>=1; 
      --b_shift; 
     } 
     uint64_t const delta=b_squared+two_a_b; 
     if((2*remainder)>delta) 
     { 
      a+=(1LL<<b_shift); 
      remainder-=delta; 
      if(b_shift) 
      { 
       --b_shift; 
      } 
     } 
    } 
    return fixed(internal(),a); 
}

注意m_nVal是內部固定點表示值，它是一個int64_t和表示使用Q36.28格式（fixed_resolution_shift = 28）。該表示本身對於至少8位小數位具有足夠的精度，並且由於赤道弧的一部分對於約0.14米的距離是有利的，所以限制不是定點表示。

使用恆向線方法的是用於此應用的標準組織的建議，以便不能被改變，並且在任何情況下，更精確的平方根函數很可能是在應用程序中或在未來應用中的其他地方必需的。

問題：是否有可能提高fixed::sqrt()算法對小非零值的準確性，同時仍然保持其有界和確定性收斂？

附加資料 用於生成上表中的測試代碼：

#include <cmath> 
#include <iostream> 
#include "fixed.hpp" 

int main() 
{ 
    double error = 1.0 ; 
    for(double x = 0.0; error > 1e-8; x += 1e-5) 
    { 
     double fixed_root = sqrt(fixed(x)).as_double() ; 
     double std_root = std::sqrt(x) ; 
     error = std::fabs(fixed_root - std_root) ; 
     std::cout << x << '\t' << std_root << '\t' << fixed_root << '\t' << error << std::endl ; 
    } 
}

結論在賈斯汀剝離溶液和分析的光，和對算法在"The Neglected Art of Fixed Point Arithmetic"，我後者改編如下：

fixed fixed::sqrt() const 
{ 
    uint64_t a = 0 ;   // root accumulator 
    uint64_t remHi = 0 ;  // high part of partial remainder 
    uint64_t remLo = m_nVal ; // low part of partial remainder 
    uint64_t testDiv ; 
    int count = 31 + (fixed_resolution_shift >> 1); // Loop counter 
    do 
    { 
     // get 2 bits of arg 
     remHi = (remHi << 2) | (remLo >> 62); remLo <<= 2 ; 

     // Get ready for the next bit in the root 
     a <<= 1; 

     // Test radical 
     testDiv = (a << 1) + 1;  
     if (remHi >= testDiv) 
     { 
      remHi -= testDiv; 
      a += 1; 
     } 

    } while (count-- != 0); 

    return fixed(internal(),a); 
}

雖然這給了更高的精確度，我所需要的改進是無法實現的。僅Q36.28格式提供了我所需的精度，但不可能在不損失幾位精度的情況下執行sqrt（）。然而一些側面思考提供了更好的解決方案我的應用程序測試距離限制距離的計算距離。事後相當明顯的解決辦法是測試距離的極限平方的距離的平方！

來源

2012-01-04 Clifford

克利福德的輸出 - URL的文章都用水管沖刷（要求一個博聞世網登錄，您很可能登錄和沒有經歷過這一點）。我試圖找到其他地方的文章，但沒有 - 谷歌緩存似乎是最好的選擇。感謝您首先參考文章。 – Dan 2012-01-04 03:36:50

@丹，我去了原來的鏈接，沒有問題。我從來沒有使用UBM，我不認爲我登錄到DDJ，所以我不知道爲什麼有問題。 – 2012-01-04 04:36:51

我已經使用了幾年的代碼;我可能在登錄要求之前下載了該庫。從[安東尼自己的網站]得到它（http://www.justsoftwaresolutions.co.uk/news/optimizing-applications-with-fixed-point-arithmetic.html） – Clifford 2012-01-04 16:06:17

原來實行顯然有一些問題。我對於試圖用代碼目前完成的方式解決所有問題感到沮喪，最終以不同的方式解決了這個問題。無論如何，我現在可以修復原作，但我更喜歡我的方式。

予治療輸入編號作爲Q64被啓動，其是與由28移動，然後通過事後14移位背面（SQRT半部它）。但是，如果你只是這樣做，那麼精度是有限的1/2^14 = 6.1035e-5，因爲在過去的14位將是0。爲了解決這個問題，我再移位a和remainder正確並保持在數字填寫我再次做循環。代碼可以更高效更清潔，但我會把它留給其他人。以下顯示的準確度與Q36.28相同。如果您將固定點sqrt與輸入數字的浮點sqrt進行比較（將其轉換爲固定點並返回），然後這些錯誤大約爲2e-9（我沒有這樣做下面的代碼，但它需要一行更改）。這符合Q36.28的最佳準確度，即1/2^28 = 3.7529e-9。

順便說一句，在原代碼的一個大錯誤是，其中m = 0從來沒有考慮過讓位不能被設置項。無論如何，這裏是代碼。請享用！

#include <iostream> 
#include <cmath> 

typedef unsigned long uint64_t; 

uint64_t sqrt(uint64_t in_val) 
{ 
    const uint64_t fixed_resolution_shift = 28; 
    const unsigned max_shift=62; 
    uint64_t a_squared=1ULL<<max_shift; 
    unsigned b_shift=(max_shift>>1) + 1; 
    uint64_t a=1ULL<<(b_shift - 1); 

    uint64_t x=in_val; 

    while(b_shift && a_squared>x) 
    { 
     a>>=1; 
     a_squared>>=2; 
     --b_shift; 
    } 

    uint64_t remainder=x-a_squared; 
    --b_shift; 

    while(remainder && b_shift) 
    { 
     uint64_t b_squared=1ULL<<(2*(b_shift - 1)); 
     uint64_t two_a_b=(a<<b_shift); 

     while(b_shift && remainder<(b_squared+two_a_b)) 
     { 
      b_squared>>=2; 
      two_a_b>>=1; 
      --b_shift; 
     } 
     uint64_t const delta=b_squared+two_a_b; 
     if((remainder)>=delta && b_shift) 
     { 
      a+=(1ULL<<(b_shift - 1)); 
      remainder-=delta; 
      --b_shift; 
     } 
    } 
    a <<= (fixed_resolution_shift/2); 
    b_shift = (fixed_resolution_shift/2) + 1; 
    remainder <<= (fixed_resolution_shift); 

    while(remainder && b_shift) 
    { 
     uint64_t b_squared=1ULL<<(2*(b_shift - 1)); 
     uint64_t two_a_b=(a<<b_shift); 

     while(b_shift && remainder<(b_squared+two_a_b)) 
     { 
      b_squared>>=2; 
      two_a_b>>=1; 
      --b_shift; 
     } 
     uint64_t const delta=b_squared+two_a_b; 
     if((remainder)>=delta && b_shift) 
     { 
      a+=(1ULL<<(b_shift - 1)); 
      remainder-=delta; 
      --b_shift; 
     } 
    } 

    return a; 
} 

double fixed2float(uint64_t x) 
{ 
    return static_cast<double>(x) * pow(2.0, -28.0); 
} 

uint64_t float2fixed(double f) 
{ 
    return static_cast<uint64_t>(f * pow(2, 28.0)); 
} 

void finderror(double num) 
{ 
    double root1 = fixed2float(sqrt(float2fixed(num))); 
    double root2 = pow(num, 0.5); 
    std::cout << "input: " << num << ", fixed sqrt: " << root1 << " " << ", float sqrt: " << root2 << ", finderror: " << root2 - root1 << std::endl; 
} 

main() 
{ 
    finderror(0); 
    finderror(1e-5); 
    finderror(2e-5); 
    finderror(3e-5); 
    finderror(4e-5); 
    finderror(5e-5); 
    finderror(pow(2.0,1)); 
    finderror(1ULL<<35); 
}

與程序是

input: 0, fixed sqrt: 0 , float sqrt: 0, finderror: 0 
input: 1e-05, fixed sqrt: 0.00316207 , float sqrt: 0.00316228, finderror: 2.10277e-07 
input: 2e-05, fixed sqrt: 0.00447184 , float sqrt: 0.00447214, finderror: 2.97481e-07 
input: 3e-05, fixed sqrt: 0.0054772 , float sqrt: 0.00547723, finderror: 2.43815e-08 
input: 4e-05, fixed sqrt: 0.00632443 , float sqrt: 0.00632456, finderror: 1.26255e-07 
input: 5e-05, fixed sqrt: 0.00707086 , float sqrt: 0.00707107, finderror: 2.06055e-07 
input: 2, fixed sqrt: 1.41421 , float sqrt: 1.41421, finderror: 1.85149e-09 
input: 3.43597e+10, fixed sqrt: 185364 , float sqrt: 185364, finderror: 2.24099e-09

來源

2012-01-05 05:49:58

這正是我所要求的，並且或多或少地代替了現有的代碼體。不幸的是，我對所需精度的估計是不正確的，即使有了這個巨大的改進，它也是不夠的這可以提高sqrt（）在其他地方使用的準確性，所以我可能會保留它。我將進一步把它看作是盡職調查，但如果你說這是性能的限制，在這一個實例中，我將不得不使用std :: sqrt（）和浮點數。 – Clifford 2012-01-05 14:13:34

我將此結果與* [「不動點算法的被忽視的藝術」]中的算法進行了比較（http://jet.ro/2006/08/07/neglected-art-of-fixed-point-arithmetic/ ）*，它會得到相同的結果，可能是您提到的更高效/更清潔的版本。你至少讓我意識到Q36.28可以達到的極限。謝謝。 – Clifford 2012-01-05 16:44:54

很多很多年前，我曾在一個演示程序的小型計算機我們的衣服已經建立。計算機有一個內置的平方根指令，我們構建了一個簡單的程序來演示在TTY上執行16位加/減/乘/除/平方根的計算機。唉，事實證明，在平方根指令中存在一個嚴重的錯誤，但我們已承諾演示函數。因此，我們創建了一個數值爲1-255的正方形數組，然後使用簡單的查找將輸入的值與其中一個數組值相匹配。該指數是平方根。

來源

2012-01-04 04:10:33

不幸的是我需要在更寬的範圍內具有更好的精度，這對查找來說是可行的。 – Clifford 2012-01-04 16:08:53

鑑於sqrt(ab) = sqrt(a)sqrt(b)，那麼你不能只是捕獲你的數字很小的情況，並將它向上移動一個給定的位數，計算根並將其向下移動一半的位數以獲得結果？

I.e.

sqrt(n) = sqrt(n.2^k)/sqrt(2^k) 
     = sqrt(n.2^k).2^(-k/2)

例如，對於任何小於2^8的n選擇k = 28。

來源

2012-01-04 04:31:26 Keith

非常聰明和有效的解決方案。 – 2012-01-04 04:52:25

我不知道你是如何獲得從fixed::sqrt()表中顯示的數字。

這是我做的：

#include <stdio.h> 
#include <math.h> 

#define __int64 long long // gcc doesn't know __int64 
typedef __int64 fixed; 

#define FRACT 28 

#define DBL2FIX(x) ((fixed)((double)(x) * (1LL << FRACT))) 
#define FIX2DBL(x) ((double)(x)/(1LL << FRACT)) 

// De-++-ified code from 
// http://www.justsoftwaresolutions.co.uk/news/optimizing-applications-with-fixed-point-arithmetic.html 
fixed sqrtfix0(fixed num) 
{ 
    static unsigned const fixed_resolution_shift=FRACT; 

    unsigned const max_shift=62; 
    unsigned __int64 a_squared=1LL<<max_shift; 
    unsigned b_shift=(max_shift+fixed_resolution_shift)/2; 
    unsigned __int64 a=1LL<<b_shift; 

    unsigned __int64 x=num; 

    unsigned __int64 remainder; 

    while(b_shift && a_squared>x) 
    { 
     a>>=1; 
     a_squared>>=2; 
     --b_shift; 
    } 

    remainder=x-a_squared; 
    --b_shift; 

    while(remainder && b_shift) 
    { 
     unsigned __int64 b_squared=1LL<<(2*b_shift-fixed_resolution_shift); 
     int const two_a_b_shift=b_shift+1-fixed_resolution_shift; 
     unsigned __int64 two_a_b=(two_a_b_shift>0)?(a<<two_a_b_shift):(a>>-two_a_b_shift); 
     unsigned __int64 delta; 

     while(b_shift && remainder<(b_squared+two_a_b)) 
     { 
      b_squared>>=2; 
      two_a_b>>=1; 
      --b_shift; 
     } 
     delta=b_squared+two_a_b; 
     if((2*remainder)>delta) 
     { 
      a+=(1LL<<b_shift); 
      remainder-=delta; 
      if(b_shift) 
      { 
       --b_shift; 
      } 
     } 
    } 
    return (fixed)a; 
} 

// Adapted code from 
// http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Digit-by-digit_calculation 
fixed sqrtfix1(fixed num) 
{ 
    fixed res = 0; 
    fixed bit = (fixed)1 << 62; // The second-to-top bit is set 
    int s = 0; 

    // Scale num up to get more significant digits 

    while (num && num < bit) 
    { 
     num <<= 1; 
     s++; 
    } 

    if (s & 1) 
    { 
     num >>= 1; 
     s--; 
    } 

    s = 14 - (s >> 1); 

    while (bit != 0) 
    { 
     if (num >= res + bit) 
     { 
      num -= res + bit; 
      res = (res >> 1) + bit; 
     } 
     else 
     { 
      res >>= 1; 
     } 

     bit >>= 2; 
    } 

    if (s >= 0) res <<= s; 
    else res >>= -s; 

    return res; 
} 

int main(void) 
{ 
    double testData[] = 
    { 
     0, 
     1e-005, 
     2e-005, 
     3e-005, 
     4e-005, 
     5e-005, 
     6e-005, 
     7e-005, 
     8e-005, 
    }; 
    int i; 

    for (i = 0; i < sizeof(testData)/sizeof(testData[0]); i++) 
    { 
     double x = testData[i]; 
     fixed xf = DBL2FIX(x); 

     fixed sqf0 = sqrtfix0(xf); 
     fixed sqf1 = sqrtfix1(xf); 

     double sq0 = FIX2DBL(sqf0); 
     double sq1 = FIX2DBL(sqf1); 

     printf("%10.8f: " 
       "sqrtfix0()=%10.8f/err=%e " 
       "sqrt()=%10.8f " 
       "sqrtfix1()=%10.8f/err=%e\n", 
       x, 
       sq0, fabs(sq0 - sqrt(x)), 
       sqrt(x), 
       sq1, fabs(sq1 - sqrt(x))); 
    } 

    printf("sizeof(double)=%d\n", (int)sizeof(double)); 

    return 0; 
}

這裏是我得到了什麼（用gcc和開放WATCOM）：

0.00000000: sqrtfix0()=0.00003052/err=3.051758e-05 sqrt()=0.00000000 sqrtfix1()=0.00000000/err=0.000000e+00 
0.00001000: sqrtfix0()=0.00311279/err=4.948469e-05 sqrt()=0.00316228 sqrtfix1()=0.00316207/err=2.102766e-07 
0.00002000: sqrtfix0()=0.00445557/err=1.656955e-05 sqrt()=0.00447214 sqrtfix1()=0.00447184/err=2.974807e-07 
0.00003000: sqrtfix0()=0.00543213/err=4.509667e-05 sqrt()=0.00547723 sqrtfix1()=0.00547720/err=2.438148e-08 
0.00004000: sqrtfix0()=0.00628662/err=3.793423e-05 sqrt()=0.00632456 sqrtfix1()=0.00632443/err=1.262553e-07 
0.00005000: sqrtfix0()=0.00701904/err=5.202484e-05 sqrt()=0.00707107 sqrtfix1()=0.00707086/err=2.060551e-07 
0.00006000: sqrtfix0()=0.00772095/err=2.501943e-05 sqrt()=0.00774597 sqrtfix1()=0.00774593/err=3.390476e-08 
0.00007000: sqrtfix0()=0.00836182/err=4.783859e-06 sqrt()=0.00836660 sqrtfix1()=0.00836649/err=1.086198e-07 
0.00008000: sqrtfix0()=0.00894165/err=2.621519e-06 sqrt()=0.00894427 sqrtfix1()=0.00894409/err=1.777289e-07 
sizeof(double)=8

編輯：

我已經錯過了這樣的事實上面的sqrtfix1()不適用於較大的參數。它可以通過向參數附加28個零來修復，並且基本上可以計算它的精確整數平方根。這都在做128位的算術內部計算的費用，但它是非常簡單的：

fixed sqrtfix2(fixed num) 
{ 
    unsigned __int64 numl, numh; 
    unsigned __int64 resl = 0, resh = 0; 
    unsigned __int64 bitl = 0, bith = (unsigned __int64)1 << 26; 

    numl = num << 28; 
    numh = num >> (64 - 28); 

    while (bitl | bith) 
    { 
     unsigned __int64 tmpl = resl + bitl; 
     unsigned __int64 tmph = resh + bith + (tmpl < resl); 

     tmph = numh - tmph - (numl < tmpl); 
     tmpl = numl - tmpl; 

     if (tmph & 0x8000000000000000ULL) 
     { 
      resl >>= 1; 
      if (resh & 1) resl |= 0x8000000000000000ULL; 
      resh >>= 1; 
     } 
     else 
     { 
      numl = tmpl; 
      numh = tmph; 

      resl >>= 1; 
      if (resh & 1) resl |= 0x8000000000000000ULL; 
      resh >>= 1; 

      resh += bith + (resl + bitl < resl); 
      resl += bitl; 
     } 

     bitl >>= 2; 
     if (bith & 1) bitl |= 0x4000000000000000ULL; 
     if (bith & 2) bitl |= 0x8000000000000000ULL; 
     bith >>= 2; 
    } 

    return resl; 
}

，並讓幾乎相同的結果比this answer（爲3.43597e + 10稍好）：

0.00000000: sqrtfix0()=0.00003052/err=3.051758e-05 sqrt()=0.00000000 sqrtfix2()=0.00000000/err=0.000000e+00 
0.00001000: sqrtfix0()=0.00311279/err=4.948469e-05 sqrt()=0.00316228 sqrtfix2()=0.00316207/err=2.102766e-07 
0.00002000: sqrtfix0()=0.00445557/err=1.656955e-05 sqrt()=0.00447214 sqrtfix2()=0.00447184/err=2.974807e-07 
0.00003000: sqrtfix0()=0.00543213/err=4.509667e-05 sqrt()=0.00547723 sqrtfix2()=0.00547720/err=2.438148e-08 
0.00004000: sqrtfix0()=0.00628662/err=3.793423e-05 sqrt()=0.00632456 sqrtfix2()=0.00632443/err=1.262553e-07 
0.00005000: sqrtfix0()=0.00701904/err=5.202484e-05 sqrt()=0.00707107 sqrtfix2()=0.00707086/err=2.060551e-07 
0.00006000: sqrtfix0()=0.00772095/err=2.501943e-05 sqrt()=0.00774597 sqrtfix2()=0.00774593/err=3.390476e-08 
0.00007000: sqrtfix0()=0.00836182/err=4.783859e-06 sqrt()=0.00836660 sqrtfix2()=0.00836649/err=1.086198e-07 
0.00008000: sqrtfix0()=0.00894165/err=2.621519e-06 sqrt()=0.00894427 sqrtfix2()=0.00894409/err=1.777289e-07 
2.00000000: sqrtfix0()=1.41419983/err=1.373327e-05 sqrt()=1.41421356 sqrtfix2()=1.41421356/err=1.851493e-09 
34359700000.00000000: sqrtfix0()=185363.69654846/err=5.097361e-06 sqrt()=185363.69655356 sqrtfix2()=185363.69655356/err=1 
.164153e-09

來源

2012-01-04 21:01:00

如何提高小數值的定點平方根

回答

相關問題