2011-12-01 211 views
5

我在Python/Scipy中處理相當大的矩陣。我需要從大矩陣(它被加載到coo_matrix)中提取行並將它們用作對角元素。目前,我這樣做,以下列方式:從稀疏矩陣的行創建一個稀疏對角矩陣

import numpy as np 
from scipy import sparse 

def computation(A): 
    for i in range(A.shape[0]): 
    diag_elems = np.array(A[i,:].todense()) 
    ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc") 
    #... 

#create some random matrix 
A = (sparse.rand(1000,100000,0.02,format="csc")*5).astype(np.ubyte) 
#get timings 
profile.run('computation(A)') 

我從profile輸出看到的是,大部分的時間由get_csr_submatrix功能而提取diag_elems消耗。這使我認爲我使用初始數據的低效稀疏表示或從稀疏矩陣中提取行的錯誤方法。你能否提出一種更好的方法從稀疏矩陣中提取一行並以對角線形式表示它?

EDIT

以下變體從行提取去除瓶頸(注意,簡單改變'csc'csr不充分,A[i,:]必須A.getrow(i)被替換以及)。然而,主要問題是如何省略實現(.todense())並根據行的稀疏表示創建對角矩陣。

import numpy as np 
from scipy import sparse 

def computation(A): 
    for i in range(A.shape[0]): 
    diag_elems = np.array(A.getrow(i).todense()) 
    ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc") 
    #... 

#create some random matrix 
A = (sparse.rand(1000,100000,0.02,format="csr")*5).astype(np.ubyte) 
#get timings 
profile.run('computation(A)') 

如果創建從直接1-行CSR矩陣對角矩陣,如下所示:

diag_elems = A.getrow(i) 
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1]) 

然後我既不能指定format="csc"參數,也不轉換ith_diags到CSC格式:

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/local/lib/python2.6/profile.py", line 70, in run 
    prof = prof.run(statement) 
    File "/usr/local/lib/python2.6/profile.py", line 456, in run 
    return self.runctx(cmd, dict, dict) 
    File "/usr/local/lib/python2.6/profile.py", line 462, in runctx 
    exec cmd in globals, locals 
    File "<string>", line 1, in <module> 
    File "<stdin>", line 4, in computation 
    File "/usr/local/lib/python2.6/site-packages/scipy/sparse/construct.py", line 56, in spdiags 
    return dia_matrix((data, diags), shape=(m,n)).asformat(format) 
    File "/usr/local/lib/python2.6/site-packages/scipy/sparse/base.py", line 211, in asformat 
    return getattr(self,'to' + format)() 
    File "/usr/local/lib/python2.6/site-packages/scipy/sparse/dia.py", line 173, in tocsc 
    return self.tocoo().tocsc() 
    File "/usr/local/lib/python2.6/site-packages/scipy/sparse/coo.py", line 263, in tocsc 
    data = np.empty(self.nnz, dtype=upcast(self.dtype)) 
    File "/usr/local/lib/python2.6/site-packages/scipy/sparse/sputils.py", line 47, in upcast 
    raise TypeError,'no supported conversion for types: %s' % args 
TypeError: no supported conversion for types: object` 
+1

你試過'format =「csr」'而不是? – cyborg

+0

用'csr'作爲初始數據,'A [i,:]'替換爲'.getrow(i)'我實現了顯着的加速。但是我正在尋找的是省略實現對角矩陣的行生成。有任何想法嗎? – savenkov

回答

3

這是我想出的:

def computation(A): 
    for i in range(A.shape[0]): 
     idx_begin = A.indptr[i] 
     idx_end = A.indptr[i+1] 
     row_nnz = idx_end - idx_begin 
     diag_elems = A.data[idx_begin:idx_end] 
     diag_indices = A.indices[idx_begin:idx_end] 
     ith_diag = sparse.csc_matrix((diag_elems, (diag_indices, diag_indices)),shape=(A.shape[1], A.shape[1])) 
     ith_diag.eliminate_zeros() 

Python分析器說1.464秒比5.574秒前。它利用了定義稀疏矩陣的底層密集數組(indptr,indices,data)。這是我的速成教程:A.indptr [i]:A.indptr [i + 1]定義密集陣列中的哪些元素對應於第i行中的非零值。 A.data是非零密集的一維數組,A和A.indptr的值是這些值的列。

我會做一些更多的測試,以便非常確定這和以前一樣。我只檢查了幾個案例。

+0

凱文,太棒了! – savenkov

+0

順便說一句,row_nnz未使用 – savenkov