檢查Julia的源代碼:
julia/base/math.jl:
^(x::Float64, y::Integer) =
box(Float64, powi_llvm(unbox(Float64,x), unbox(Int32,Int32(y))))
^(x::Float32, y::Integer) =
box(Float32, powi_llvm(unbox(Float32,x), unbox(Int32,Int32(y))))
julia/base/fastmath.jl:
pow_fast{T<:FloatTypes}(x::T, y::Integer) = pow_fast(x, Int32(y))
pow_fast{T<:FloatTypes}(x::T, y::Int32) =
box(T, Base.powi_llvm(unbox(T,x), unbox(Int32,y)))
我們可以看到,朱莉婭使用powi_llvm
檢查llvm's source code:
define double @powi(double %F, i32 %power) {
; CHECK: powi:
; CHECK: bl __powidf2
%result = call double @llvm.powi.f64(double %F, i32 %power)
ret double %result
}
現在,__powidf2是有趣的函數這裏:
COMPILER_RT_ABI double
__powidf2(double a, si_int b)
{
const int recip = b < 0;
double r = 1;
while (1)
{
if (b & 1)
r *= a;
b /= 2;
if (b == 0)
break;
a *= a;
}
return recip ? 1/r : r;
}
實施例1:給定一個= 2; b = 7:
- r = 1
- iteration 1: r = 1 * 2 = 2; b = (int)(7/2) = 3; a = 2 * 2 = 4
- iteration 2: r = 2 * 4 = 8; b = (int)(3/2) = 1; a = 4 * 4 = 16
- iteration 3: r = 8 * 16 = 128;
例2:給定a = 2; b = 8:
- r = 1
- iteration 1: r = 1; b = (int)(8/2) = 4; a = 2 * 2 = 4
- iteration 2: r = 1; b = (int)(4/2) = 2; a = 4 * 4 = 16
- iteration 3: r = 1; b = (int)(2/2) = 1; a = 16 * 16 = 256
- iteration 4: r = 1 * 256 = 256; b = (int)(1/2) = 0;
整數功率總是作爲序列乘法實現的。這就是爲什麼N^3比N^2慢。
jl_powi_llvm(在fastmath.jl中調用「jl_」由宏擴展連接)另一方面將指數轉換爲浮點並調用pow()。 C源代碼:
JL_DLLEXPORT jl_value_t *jl_powi_llvm(jl_value_t *a, jl_value_t *b)
{
jl_value_t *ty = jl_typeof(a);
if (!jl_is_bitstype(ty))
jl_error("powi_llvm: a is not a bitstype");
if (!jl_is_bitstype(jl_typeof(b)) || jl_datatype_size(jl_typeof(b)) != 4)
jl_error("powi_llvm: b is not a 32-bit bitstype");
jl_value_t *newv = newstruct((jl_datatype_t*)ty);
void *pa = jl_data_ptr(a), *pr = jl_data_ptr(newv);
int sz = jl_datatype_size(ty);
switch (sz) {
/* choose the right size c-type operation */
case 4:
*(float*)pr = powf(*(float*)pa, (float)jl_unbox_int32(b));
break;
case 8:
*(double*)pr = pow(*(double*)pa, (double)jl_unbox_int32(b));
break;
default:
jl_error("powi_llvm: runtime floating point intrinsics are not implemented for bit sizes other than 32 and 64");
}
return newv;
}
感謝您的回覆。關於math.jl和fastmath.jl的信息是我一直在尋找的。 –
'@ fastmath'實際上調用'pow_fast',它是一個通用函數,在正確的類型中,它調用與沒有'@fastmath'相同的'powi_llvm'。從@REL調用'@ fastmath'宏時,會有一些類型推斷混淆。 –
另外,'pow_fast'調用'jl_powi_llvm' - 所以您可以將'jl_'添加到powi_llvm的最後一個鏈接。並感謝爲我澄清了一些東西的偉大答案(我正在使用0.5)。 –