高效地使用python生成器創建scipy.lil_matrix

我有一個生成器，生成長度相同的單維numpy.array。我想有一個包含這些數據的稀疏矩陣。行按照我希望在最終矩陣中具有的順序生成。 csr matrix優於lil矩陣，但我認爲後者在我描述的場景中更容易構建。高效地使用python生成器創建scipy.lil_matrix

假設row_gen是一個產生numpy.array行的生成器，下面的代碼按預期工作。

def row_gen(): 
    yield numpy.array([1, 2, 3]) 
    yield numpy.array([1, 0, 1]) 
    yield numpy.array([1, 0, 0]) 

matrix = scipy.sparse.lil_matrix(list(row_gen()))

因爲該列表將基本上毀了發電機的任何好處，我想下面有相同的最終結果。但是它會引發以下異常運行時

def row_gen(): 
    yield numpy.array([1, 2, 3]) 
    yield numpy.array([1, 0, 1]) 
    yield numpy.array([1, 0, 0]) 

matrix = scipy.sparse.lil_matrix(row_gen())

：更具體地講，我不能在內存中保存整個密集矩陣（或所有的矩陣行的列表）

TypeError: no supported conversion for types: (dtype('O'),)

我也注意到了跟蹤包括以下內容：

File "/usr/local/lib/python2.7/site-packages/scipy/sparse/lil.py", line 122, in __init__ 
    A = csr_matrix(A, dtype=dtype).tolil()

這讓我想到了用scipy.sparse.lil_matrix最終將建立一個csr矩陣，然後纔將其轉換成一個lil矩陣。在這種情況下，我寧願創建csr矩陣開始。

回顧一下，我的問題是：什麼是從python生成器或numpy單維數組創建scipy.sparse矩陣的最有效方法？

來源

2016-11-01 NirIzr

讓我們來看看sparse.lil_matrix的代碼。它檢查第一個參數：

if isspmatrix(arg1): # is is already a sparse matrix 
    ... 
elif isinstance(arg1,tuple): # is it the shape tuple 
    if isshape(arg1): 
     if shape is not None: 
      raise ValueError('invalid use of shape parameter') 
     M, N = arg1 
     self.shape = (M,N) 
     self.rows = np.empty((M,), dtype=object) 
     self.data = np.empty((M,), dtype=object) 
     for i in range(M): 
      self.rows[i] = [] 
      self.data[i] = [] 
    else: 
     raise TypeError('unrecognized lil_matrix constructor usage') 
else: 
    # assume A is dense 
    try: 
     A = np.asmatrix(arg1) 
    except TypeError: 
     raise TypeError('unsupported matrix type') 
    else: 
     from .csr import csr_matrix 
     A = csr_matrix(A, dtype=dtype).tolil() 

     self.shape = A.shape 
     self.dtype = A.dtype 
     self.rows = A.rows 
     self.data = A.data

按照文檔 - 則可以從另一稀疏矩陣構造它，由形狀，和從密集陣列。密集陣列構造函數首先創建一個csr矩陣，然後將其轉換爲lil。

形狀版構造了一個空lil與像數據：

In [161]: M=sparse.lil_matrix((3,5),dtype=int) 
In [163]: M.data 
Out[163]: array([[], [], []], dtype=object) 
In [164]: M.rows 
Out[164]: array([[], [], []], dtype=object)

應該是顯而易見的，通過發電機是不會工作 - 它不是一個密集排列。

不過話說創建lil矩陣，你可以在元素填充與常規數組賦值：

In [167]: M[0,:]=[1,0,2,0,0] 
In [168]: M[1,:]=[0,0,2,0,0] 
In [169]: M[2,3:]=[1,1] 
In [170]: M.data 
Out[170]: array([[1, 2], [2], [1, 1]], dtype=object) 
In [171]: M.rows 
Out[171]: array([[0, 2], [2], [3, 4]], dtype=object) 
In [172]: M.A 
Out[172]: 
array([[1, 0, 2, 0, 0], 
     [0, 0, 2, 0, 0], 
     [0, 0, 0, 1, 1]])

，您可以直接賦值給子列表（我認爲這是快，但多了幾分危險）：

In [173]: M.data[1]=[1,2,3] 
In [174]: M.rows[1]=[0,2,4] 
In [176]: M.A 
Out[176]: 
array([[1, 0, 2, 0, 0], 
     [1, 0, 2, 0, 3], 
     [0, 0, 0, 1, 1]])

另一個增量方法是構建3個數組或列表coo格式的，然後使從這些一個或coocsr。

sparse.bmat是另一種選擇，其代碼是構建coo輸入的一個好例子。我會讓你看看你自己。

來源

2016-11-01 01:13:40 hpaulj

謝謝，但是你提出的兩種方法都假設我已經掌握了矩陣的形狀。事實並非如此，由於lil是專門爲增加行數而構建的，因此我正在尋找一種在構建時有效增加其大小的方法。 – NirIzr

什麼是未知數 - 行數或列數？或兩者？收集數組的'coo'三重奏不需要知道最終的大小，但是'稀疏'矩陣不是爲增量增長而設計的（也不是稀疏數組）。 – hpaulj

高效地使用python生成器創建scipy.lil_matrix

回答

相關問題