2014-02-17 24 views
4

以下是我的cython代碼,目的是做一個bootstrap。Cython沒有速度收益嗎?

def boots(int trial, np.ndarray[double, ndim=2] empirical, np.ndarray[double, ndim=2] expected): 
    cdef int length = len(empirical) 
    cdef np.ndarray[double, ndim=2] ret = np.empty((trial, 100)) 
    cdef np.ndarray[long] choices 
    cdef np.ndarray[double] m 
    cdef np.ndarray[double] n 
    cdef long o 
    cdef int i 
    cdef int j 

    for i in range(trial): 
     choices = np.random.randint(0, length, length) 

     m = np.zeros(100) 
     n = np.zeros(100) 
     for j in range(length): 
      o = choices[j] 
      m.__iadd__(empirical[o]) 
      n.__iadd__(expected[o]) 
     empirical_boot = m/length 
     expected_boot = n/length 

     ret[i] = empirical_boot/expected_boot - 1 
    ret.sort(axis=0) 
    return ret[int(trial * 0.025)].reshape((10,10)), ret[int(trial * 0.975)].reshape((10,10)) 


# test code 
empirical = np.ones((40000, 100)) 
expected = np.ones((40000, 100)) 
%prun -l 10 boots(100, empirical,expected) 

花式索引需要11秒鐘的時間,不管我在cython中調整的方式如何都保持不變。

np.random.randint(0, 40000, 40000)需要1毫秒,所以100x需要0.1秒。

np.sort(np.ones((40000, 100))需要0.2s。

因此,我覺得必須有方法來改進boots

+0

爲什麼'cdef length = len(經驗型)'而不是'cdef int length = len(經驗型)'? – hivert

+0

@hivert錯字,但它不會影響速度。我很驚訝它編譯。 – colinfang

+0

以及爲什麼你使用2 100個元素向量('m'和'n'),如果它們的元素都是相同的? '__iadd__'應該在整個陣列上運行... – goncalopp

回答

3

您所看到的主要問題是,Cython只針對類型化數組優化單項訪問。這意味着代碼中您使用NumPy矢量化的每一行仍涉及到創建並與Python對象進行交互。 你在那裏的代碼並不比純Python版本快,因爲它並不是真的在做任何不同的計算。 你必須通過明確地寫出循​​環操作來避免這種情況。 這是您的代碼的修改版本,運行速度顯着加快。

from numpy cimport ndarray as ar 
from numpy cimport int32_t as int32 
from numpy import empty 
from numpy.random import randint 
cimport cython 
ctypedef int 

# Notice the use of these decorators to tell Cython to turn off 
# some of the checking it does when accessing arrays. 
@cython.boundscheck(False) 
@cython.wraparound(False) 
def boots(int32 trial, ar[double, ndim=2] empirical, ar[double, ndim=2] expected): 
    cdef: 
     int32 length = empirical.shape[0], i, j, k 
     int32 o 
     ar[double, ndim=2] ret = empty((trial, 100)) 
     ar[int32] choices 
     ar[double] m = empty(100), n = empty(100) 
    for i in range(trial): 
     # Still calling Python on this line 
     choices = randint(0, length, length) 
     # It was faster to compute m and n separately. 
     # I suspect that has to do with cache management. 
     # Instead of allocating new arrays, I just filled the old ones with the new values. 
     o = choices[0] 
     for k in range(100): 
      m[k] = empirical[o,k] 
     for j in range(1, length): 
      o = choices[j] 
      for k in range(100): 
       m[k] += empirical[o,k] 
     o = choices[0] 
     for k in range(100): 
      n[k] = expected[o,k] 
     for j in range(1, length): 
      o = choices[j] 
      for k in range(100): 
       n[k] += expected[o,k] 
     # Here I simplified some of the math and got rid of temporary arrays 
     for k in range(100): 
      ret[i,k] = m[k]/n[k] - 1. 
    ret.sort(axis=0) 
    return ret[int(trial * 0.025)].reshape((10,10)), ret[int(trial * 0.975)].reshape((10,10)) 

如果你想看看你的代碼的行涉及Python的來電,用Cython編譯器可以生成一個HTML文件顯示哪些行調用Python。 該選項稱爲註釋。 你使用它的方式取決於你如何編譯你的cython代碼。 如果您使用IPython筆記本,只需將--annotate標誌添加到Cython單元格魔術中即可。

您可能也能從打開C編譯器優化標誌中受益。

+0

我從來沒有料到過,與C版相比,「m .__ iadd __(經驗[o])」 – colinfang