爲什麼X.dot（X.T）在numpy中需要這麼多內存？

X是一個n×p矩陣，其中p比n大得多。比方說，N = 1000，P = 500000當我運行：爲什麼X.dot（X.T）在numpy中需要這麼多內存？

X = np.random.randn(1000,500000) 
S = X.dot(X.T)

執行此操作結束了拍攝的內存很大，儘管尺寸的結果是1000×1000的內存使用落回原位，一旦操作完成。有沒有辦法解決？

來源

2014-01-18 bluecat

有趣。除了'X'和'S'的空間外，這應該在很小的恆定空間內執行：以'X'換位視圖，計算併爲'S'分配空間，並計算'S'的每個元素直。從我對NumPy的瞭解來看，類似的東西會自動從這個操作組合中脫落出來...... – delnan

我不確定細節，但是隻要有可能，numpy會嘗試使用BLAS例程，我認爲它會['sgemm用於矩陣乘法的']（http://www.math.utah.edu/software/lapack/lapack-blas/sgemm.html）。我敢打賭，這些高度優化的例程需要連續的數據（可能採用Fortran命令），因此必須對您的情況進行復制。 – Jaime

使用numpy> = 1.8，這將有助於許多此類情況。 – seberg

的問題不在於X和X.T是示相同的內存空間的本身，而是X.T是F-連續而不是C連續的。當然，在的情況下，您必須對至少一個輸入數組使用，在這種情況下，您將數組與其轉置的視圖相乘。

在numpy的< 1.8，np.dot將創建任何 F-有序輸入陣列，而不只是那些碰巧意見到的相同內存塊的C-有序的副本。

例如：

X = np.random.randn(1000,50000) 
Y = np.random.randn(50000, 100) 

# X and Y are both C-order, no copy 
%memit np.dot(X, Y) 
# maximum of 1: 485.554688 MB per loop 

# make X Fortran order and Y C-order, now the larger array (X) gets 
# copied 
X = np.asfortranarray(X) 
%memit np.dot(X, Y) 
# maximum of 1: 867.070312 MB per loop 

# make X C-order and Y Fortran order, now the smaller array (Y) gets 
# copied 
X = np.ascontiguousarray(X) 
Y = np.asfortranarray(Y) 
%memit np.dot(X, Y) 
# maximum of 1: 523.792969 MB per loop 

# make both of them F-ordered, both get copied! 
X = np.asfortranarray(X) 
%memit np.dot(X, Y) 
# maximum of 1: 905.093750 MB per loop

如果複製是一個問題（例如，當X是非常大的），你能做些什麼呢？

最好的選擇可能是升級到更新版本的numpy--就像@perimosocordiae指出的那樣，這個性能問題在this pull request中解決。

如果因任何原因，你無法升級numpy的，還有一招，可以讓你直接通過scipy.linalg.blas調用相關功能BLAS（從this answer無恥被盜後不強制副本進行快速，基於BLAS點產品）：

from scipy.linalg import blas 
X = np.random.randn(1000,50000) 

%memit res1 = np.dot(X, X.T) 
# maximum of 1: 845.367188 MB per loop 

%memit res2 = blas.dgemm(alpha=1., a=X.T, b=X.T, trans_a=True) 
# maximum of 1: 471.656250 MB per loop 

print np.all(res1 == res2) 
# True

來源

2014-01-19 03:01:35

這是上面提到的，但numpy> = 1.8爲你解決了這個問題：https://github.com/numpy/numpy/pull/2730 – perimosocordiae

@perimosocordiae是的，但有些人可能會覺得無論什麼原因升級尷尬（破依賴關係等），所以我想我會使用1.7.1給出解決方案。我會編輯我的答案，使其更清晰。 –

爲什麼X.dot（X.T）在numpy中需要這麼多內存？

回答

相關問題