稀疏矩陣切片內存錯誤

我有一個稀疏矩陣csr：稀疏矩陣切片內存錯誤

<681881x58216 sparse matrix of type '<class 'numpy.int64'>' 
    with 2867209 stored elements in Compressed Sparse Row format>

，我想創建一個新的sparce矩陣的csr片： csr_2 = csr[1::2,:]。

問題：時，我有csr矩陣而已，我的服務器的內存忙於40 GB。當我運行csr_2 = csr[1::2,:]時，我的服務器RAM正在完全轉儲128GB，並且隨着「內存錯誤」而下降。

來源

2017-09-04 Ladenkov Vladislav

你基質本身在你的例子僅僅是22MB（值）+一些AUX-東西，大概<內存80MB。那麼你確定，這是你問題的根源（服務器上的其他內容可能使用了39GB的內存）？（並且稀疏矩陣切片會順便產生一個副本） – sascha

（1）這個切片將每個元素放在另一個元素之後，從第二個元素（奇數元素）開始。（2）服務器有很多docker fcontainer和其他維護進程一起運行，總共需要40GB –

sparse使用矩陣乘法來選擇這樣的行。我在另一個SO問題中計算了extractor矩陣的細節，但大致上要從（m，n）中得到一個（p，n）矩陣，它需要使用一個（p，m）矩陣（用非零值）。

矩陣乘法本身是一個2遍過程。第一遍決定了結果矩陣的大小。

與密集的numpy數組相比，稀疏矩陣切片永遠不會返回視圖。

Sparse matrix slicing using list of int

對提取矩陣的細節。我也建議測試csr.sum(axis=1)，因爲它也使用矩陣乘法。

def extractor(indices, N): 
    indptr=np.arange(len(indices)+1) 
    data=np.ones(len(indices)) 
    shape=(len(indices),N) 
    return sparse.csr_matrix((data,indices,indptr), shape=shape)

所以索引每隔一行要求：

In [99]: M = sparse.random(100,80,.1, 'csr') 
In [100]: M 
Out[100]: 
<100x80 sparse matrix of type '<class 'numpy.float64'>' 
    with 800 stored elements in Compressed Sparse Row format> 
In [101]: E = extractor(np.r_[1:100:2],100) 
In [102]: E 
Out[102]: 
<50x100 sparse matrix of type '<class 'numpy.float64'>' 
    with 50 stored elements in Compressed Sparse Row format> 
In [103]: M1 = E*M 
In [104]: M1 
Out[104]: 
<50x80 sparse matrix of type '<class 'numpy.float64'>' 
    with 407 stored elements in Compressed Sparse Row format>

來源

2017-09-04 16:10:14 hpaulj

謝謝，我會稍後研究你的答案！ –

那麼，你提出的解決方案是使用提取函數？ –

不，我只是提出一個原因，你可能會得到內存錯誤。但沒有你的數據，內存等我無法證明它。 – hpaulj

稀疏矩陣切片內存錯誤

回答

相關問題