2013-01-07 29 views
4

下面的代碼非常緩慢甚至不能完成我的系統上:scipy.sparse點在Python

import numpy as np 
from scipy import sparse 
p = 100 
n = 50 
X = np.random.randn(p,n) 
L = sparse.eye(p,p, format='csc') 
X.T.dot(L).dot(X) 

有爲什麼這個矩陣乘法掛任何解釋?

+0

是什麼'sparse'?使用'np.eye(p,p)'代替這一行,下面的行實際上是瞬時的。 – jozzas

+1

對不起,我沒有顯示從scipy顯式導入:從scipy導入稀疏。我使用小矩陣大小作爲概念證明,實際上p大約爲10,000 – bluecat

回答

8

X.T.dot(L)不是,因爲你可能會認爲,一個50×矩陣,但100×100

>>> X.T.dot(L).shape 
(50, 100) 
>>> X.T.dot(L)[0,0] 
<100x100 sparse matrix of type '<type 'numpy.float64'>' 
    with 100 stored elements in Compressed Sparse Column format> 

50×稀疏矩陣陣列看來這個問題是Xdot方法,它是一個數組,不知道稀疏矩陣。因此,您必須使用其todensetoarray方法將稀疏矩陣轉換爲密度。前者返回一個matrix對象,後者的array

>>> X.T.dot(L.todense()).dot(X) 
matrix([[ 81.85399873, 3.75640482, 1.62443625, ..., 6.47522251, 
      3.42719396, 2.78630873], 
     [ 3.75640482, 109.45428475, -2.62737229, ..., -0.31310651, 
      2.87871548, 8.27537382], 
     [ 1.62443625, -2.62737229, 101.58919604, ..., 3.95235372, 
      1.080478 , -0.16478654], 
     ..., 
     [ 6.47522251, -0.31310651, 3.95235372, ..., 95.72988689, 
      -18.99209596, 17.31774553], 
     [ 3.42719396, 2.87871548, 1.080478 , ..., -18.99209596, 
      108.90045569, -16.20312682], 
     [ 2.78630873, 8.27537382, -0.16478654, ..., 17.31774553, 
      -16.20312682, 105.37102461]]) 

另外,稀疏矩陣有dot方法,它知道數組:

>>> X.T.dot(L.dot(X)) 
array([[ 81.85399873, 3.75640482, 1.62443625, ..., 6.47522251, 
      3.42719396, 2.78630873], 
     [ 3.75640482, 109.45428475, -2.62737229, ..., -0.31310651, 
      2.87871548, 8.27537382], 
     [ 1.62443625, -2.62737229, 101.58919604, ..., 3.95235372, 
      1.080478 , -0.16478654], 
     ..., 
     [ 6.47522251, -0.31310651, 3.95235372, ..., 95.72988689, 
     -18.99209596, 17.31774553], 
     [ 3.42719396, 2.87871548, 1.080478 , ..., -18.99209596, 
     108.90045569, -16.20312682], 
     [ 2.78630873, 8.27537382, -0.16478654, ..., 17.31774553, 
     -16.20312682, 105.37102461]]) 
+1

第二個選項「XTdot(L.dot(X))」大約是p時的兩倍= 1000和n = 500!抓住數組創建@Jaime。我在考慮稀疏矩陣與非稀疏矩陣乘法不太匹配(因爲它使得它變得非稀疏,但它完全不利於操作)。 – dhj

+1

巨大的捕獲,它不幸的是,numpy不知道稀疏矩陣,並沒有提供任何這種意外行爲的警告。 – bluecat