如果你在x86_64的工作那麼ASM支持128個整數:
int64_t fn(uint64_t a, uint64_t b, uint64_t c, uint64_t d) {
asm (
"mulq %1\n" // a *= b
"movq %%rbx, %%rdx\n"// rbx = upper 64 bit of the multiplication
"mulq %2\n" // multiply the lower 64 bits by c
"push %%rax\n" // temporarily save the lowest 64 bits on the stack
"mov %%rcx, %%rdx\n" // rcx = upper 64 bits of the multiplication
"movq %%rax, %%rbx\n"//
"mulq %2\n" // multiply the upper 64 bits by c
"addq %%rax, %%rcx\n"// combine the middle 64 bits
"addcq %%rdx, $0\n" // transfer carry tp the higest 64 bits if present
"divq %3\n" // divide the upper 128 (of 192) bits by d
"mov %%rbx, %%rax\n" // rbx = result
"pop %%rax\n"
"divq %3\n" // divide remainder:lower 64 bits by d
: "+a" (a) // assigns a to rax register as in/out
, "+b" (b) // assigns b to rbx register
: "g" (c) // assigns c to random register
, "g" (d) // assigns d to random register
: "edx", "rdx" // tells the compiler that edx/rdx will be used internally, but does not need any input
);
// b now holds the upper 64 bit if (a * b * c/d) > UINT64_MAX
return a;
}
請注意,所有的輸入整數必須是相同的長度。工作長度將是輸入的兩倍。僅與未簽名一起使用。
x86上的原生div
和mul
指令正確地使用雙精度來允許溢出。令人遺憾的是,我不瞭解使用它們的內在編譯器。
取決於你需要多快。如果你願意做一些像素分解,你可以在不降低精度的情況下減少它,但速度要慢得多。 – Thomas
你在x86_64上工作嗎? –
@Thomas:你不必做全分解因子分解...找到GCD就足夠了,而且要快得多。 –