如何提高非常低效numpy的代碼，用於計算相關

我寫下面的函數相對於計算的行由一個矩陣的行相關性來選擇的行（由index參數指定）：如何提高非常低效numpy的代碼，用於計算相關

# returns a 1D array of correlation coefficients whose length matches 
# the row count of the given np_arr_2d 
def ma_correlate_vs_index(np_arr_2d, index): 
    def corr_upper(x, y): 
     # just take the upper right corner of the correlation matrix 
     return numpy.ma.corrcoef(x, y)[0, 1] 

    return numpy.ma.apply_along_axis(corr_upper, 1, np_arr_2d, np_arr_2d[index, :])

的問題在於代碼非常非常慢，我不確定如何提高性能。我相信apply_along_axis的使用以及corrcoef正在創建2D陣列的事實都是導致性能較差的原因。有沒有更直接的方法來計算，可能會有更好的表現？

萬一它很重要我使用ma版本的功能掩蓋了在數據中找到的一些nan值。此外，我的數據np_arr_2d的形狀是(623065, 72)。

來源

2014-01-29 Keith

我認爲你是對的，在corrcoef有很多開銷。基本上，你只需要每行的點積與索引行，歸一化到最大值1.0。

像這樣的事情會工作，會快很多：

# Demean 
demeaned = np_arr_2d - np_arr_2d.mean(axis=1)[:, None] 

# Dot product of each row with index 
res = np.ma.dot(demeaned, demeaned[index]) 

# Norm of each row 
row_norms = np.ma.sqrt((demeaned ** 2).sum(axis=1)) 

# Normalize 
res = res/row_norms/row_norms[index]

這將運行速度遠遠超過原來的代碼。我使用了蒙版數組方法，所以我認爲這將適用於包含NaN的數據。

有可能是在一個規範的細微差別，通過ddof在corrcoef控制，在這種情況下，你可以使用np.ma.std計算row_norms並指定你想要的ddof。

來源

2014-02-02 17:08:10 cxrodgers

如何提高非常低效numpy的代碼，用於計算相關

回答

相關問題