如何使用fortran和ctypes?
elementwise.F90:
subroutine elementwise(a, b, c, M, N) bind(c, name='elementwise')
use iso_c_binding, only: c_float, c_int
integer(c_int),intent(in) :: M, N
real(c_float), intent(in) :: a(M, N), b(M, N)
real(c_float), intent(out):: c(M, N)
integer :: i,j
forall (i=1:M,j=1:N)
c(i,j) = a(i,j) * b(i,j)
end forall
end subroutine
elementwise.py:
from ctypes import CDLL, POINTER, c_int, c_float
import numpy as np
import time
fortran = CDLL('./elementwise.so')
fortran.elementwise.argtypes = [ POINTER(c_float),
POINTER(c_float),
POINTER(c_float),
POINTER(c_int),
POINTER(c_int) ]
# Setup
M=10
N=5000000
a = np.empty((M,N), dtype=c_float)
b = np.empty((M,N), dtype=c_float)
c = np.empty((M,N), dtype=c_float)
a[:] = np.random.rand(M,N)
b[:] = np.random.rand(M,N)
# Fortran call
start = time.time()
fortran.elementwise(a.ctypes.data_as(POINTER(c_float)),
b.ctypes.data_as(POINTER(c_float)),
c.ctypes.data_as(POINTER(c_float)),
c_int(M), c_int(N))
stop = time.time()
print 'Fortran took ',stop - start,'seconds'
# Numpy
start = time.time()
c = np.multiply(a,b)
stop = time.time()
print 'Numpy took ',stop - start,'seconds'
予編譯使用
gfortran -O3 -funroll-loops -ffast-math -floop-strip-mine -shared -fPIC \
-o elementwise.so elementwise.F90
輸出的文件的Fortran產生的加速〜10 %:
$ python elementwise.py
Fortran took 0.213667869568 seconds
Numpy took 0.230120897293 seconds
$ python elementwise.py
Fortran took 0.209784984589 seconds
Numpy took 0.231616973877 seconds
$ python elementwise.py
Fortran took 0.214708089828 seconds
Numpy took 0.25369310379 seconds
'numexpr'可以一枝獨秀'像這樣ufunc般的操作numpy',尤其是幾個串在一起。另外,如果您有多個內核,請嘗試設置'ne.set_num_cores(N)',其中'N'是您的計算機的核心數。 – askewchan
在我的機器上,基於'numexpr'的函數比在單個內核上運行的'np.multiply()'運行速度慢大約15%,但是當我將內核數量設置爲8時,它的速度會降低大約2倍。記住,你可能會發現你必須重置你的Python進程的核心關係才能使用多個核心 - [請參閱我的答案](http://stackoverflow.com/a/15641148/1461210)。 –
您可以嘗試使用[Theano]使用您的GPU(https://github.com/Theano/Theano)。我真的不知道它是否會有所幫助,結果將取決於您的確切硬件,但它可能值得一試。 [這裏](https://groups.google.com/forum/#!topic/theano-users/fZpCchn4JbI)你會找到一個如何使用Theano進行元素矩陣乘法的例子。 –