修改稀疏矩陣時出現意外行爲

我有一個稀疏矩陣M和一個數組a我想將M加1的位置。此數組a可能包含重複項，並且每當a中的元素爲n次時，我想將n添加到M中的對應位置。我這樣做的方式如下：修改稀疏矩陣時出現意外行爲

from scipy import sparse as sp 
M = sp.csr_matrix((3, 4), dtype=float) 
M[[0,0,0,0,0], [0,1,0,1,0]] += 1

但是當我運行此，M[0,0]僅增加了一個，有一個簡單的方法來適應這一點？

來源

2017-06-20 HolyMonk

請考慮閱讀numpy的年代和SciPy的的文檔，以瞭解這裏發生了什麼。因此，計算的基本流水線（對於矢量化方法）可能是：A：對你的位置進行排序（lex），B：創建一個1d-vec的對象，在A中合併模糊，同時對B進行求和（B的尺寸可能會減小;條目可能會從1增加到N），C：在使用A進行索引時添加這些B值。一個更簡單的（基於循環的方法）：只需抓取循環中的每個位置並逐個遞增。 – sascha

好的，謝謝。這就是我這樣做的方式，但預計會有一個更快的方法。我來自MATLAB，所以我總是期望矩陣操作比循環更快。 – HolyMonk

在大多數情況下是。然後試試我的第一種方法（或者等待一些專家提出更好的方法）。 – sascha

MATLAB如何處理這個問題？

numpy都有特定的功能來處理這種重複指示的情況下，add.at

Using ufunc.at on matrix

這尚未爲scipy.sparse實施。

由於sparse在將coo格式轉換爲csr格式時重複了座標之和，我懷疑可以利用該格式轉換此問題。實際上，csr矩陣有一個M.sum_duplicates方法。我不得不四處弄清楚細節。

In [876]: M = sparse.csr_matrix((3, 4), dtype=float) 
In [877]: M 
Out[877]: 
<3x4 sparse matrix of type '<class 'numpy.float64'>' 
    with 0 stored elements in Compressed Sparse Row format>

展示np.add.at行動：

In [878]: arr = M.A 
In [879]: arr[[0,0,0,0,0],[0,1,0,1,0]] += 1 
In [880]: arr 
Out[880]: 
array([[ 1., 1., 0., 0.], 
     [ 0., 0., 0., 0.], 
     [ 0., 0., 0., 0.]]) 

In [883]: arr = M.A 
In [884]: np.add.at(arr,[[0,0,0,0,0],[0,1,0,1,0]],1) 
In [885]: arr 
Out[885]: 
array([[ 3., 2., 0., 0.], 
     [ 0., 0., 0., 0.], 
     [ 0., 0., 0., 0.]])

添加到M產生相同的緩衝作用 - 一個警告。改變矩陣的稀疏性相對昂貴。

In [886]: M[[0,0,0,0,0],[0,1,0,1,0]] += 1 
.... 
    SparseEfficiencyWarning) 
In [887]: M 
Out[887]: 
<3x4 sparse matrix of type '<class 'numpy.float64'>' 
    with 2 stored elements in Compressed Sparse Row format> 
In [888]: M.A 
Out[888]: 
array([[ 1., 1., 0., 0.], 
     [ 0., 0., 0., 0.], 
     [ 0., 0., 0., 0.]])

正確的方式做，這除了是使一個新的稀疏矩陣與需要添加的值。我們可以採取的事實，即coo風格的投入總和與轉換爲csr複製：

In [895]: m = sparse.csr_matrix((np.ones(5,int),([0,0,0,0,0],[0,1,0,1,0])), shape=M.shape) 
In [896]: m 
Out[896]: 
<3x4 sparse matrix of type '<class 'numpy.int32'>' 
    with 2 stored elements in Compressed Sparse Row format> 
In [897]: m.A 
Out[897]: 
array([[3, 2, 0, 0], 
     [0, 0, 0, 0], 
     [0, 0, 0, 0]], dtype=int32)

現在我們可以添加原始和新：

In [898]: M = sparse.csr_matrix((3, 4), dtype=float) 
In [899]: M+m 
Out[899]: 
<3x4 sparse matrix of type '<class 'numpy.float64'>' 
    with 2 stored elements in Compressed Sparse Row format>

來源

2017-06-20 16:25:01 hpaulj

修改稀疏矩陣時出現意外行爲

回答

相關問題