從npy文件加載稀疏數組

我想加載一個我以前保存過的稀疏數組。保存稀疏數組非常簡單。試圖閱讀它雖然是一個痛苦。 scipy.load在我的稀疏數組周圍返回一個0d數組。從npy文件加載稀疏數組

import scipy as sp 
A = sp.load("my_array"); A 
array(<325729x325729 sparse matrix of type '<type 'numpy.int8'>' 
with 1497134 stored elements in Compressed Sparse Row format>, dtype=object)

爲了得到一個稀疏矩陣，我必須使0d數組扁平化，或者使用sp.asarray（A）。這似乎是一種很難做的事情。 Scipy是否足夠聰明地理解它已經加載了一個稀疏數組？有沒有更好的方法來加載稀疏數組？

來源

2011-06-08 iform

mmwrite/mmread scipy.io中的函數可以保存/載入矩陣市場格式的稀疏矩陣。

scipy.io.mmwrite('/tmp/my_array',x) 
scipy.io.mmread('/tmp/my_array').tolil()

mmwrite和mmread可能是你所需要的。它經過了充分測試，並採用了衆所周知的格式。

然而，下面可能是一個有點快：

我們的行和列座標和數據保存在NPZ格式1-d陣列。

import random 
import scipy.sparse as sparse 
import scipy.io 
import numpy as np 

def save_sparse_matrix(filename,x): 
    x_coo=x.tocoo() 
    row=x_coo.row 
    col=x_coo.col 
    data=x_coo.data 
    shape=x_coo.shape 
    np.savez(filename,row=row,col=col,data=data,shape=shape) 

def load_sparse_matrix(filename): 
    y=np.load(filename) 
    z=sparse.coo_matrix((y['data'],(y['row'],y['col'])),shape=y['shape']) 
    return z 

N=20000 
x = sparse.lil_matrix((N,N)) 
for i in xrange(N): 
    x[random.randint(0,N-1),random.randint(0,N-1)]=random.randint(1,100) 

save_sparse_matrix('/tmp/my_array',x) 
load_sparse_matrix('/tmp/my_array.npz').tolil()

下面是一些代碼這表明在NPZ文件保存稀疏矩陣可以比使用mmwrite/mmread更快：

def using_np_savez():  
    save_sparse_matrix('/tmp/my_array',x) 
    return load_sparse_matrix('/tmp/my_array.npz').tolil() 

def using_mm(): 
    scipy.io.mmwrite('/tmp/my_array',x) 
    return scipy.io.mmread('/tmp/my_array').tolil()  

if __name__=='__main__': 
    for func in (using_np_savez,using_mm): 
     y=func() 
     print(repr(y)) 
     assert(x.shape==y.shape) 
     assert(x.dtype==y.dtype) 
     assert(x.__class__==y.__class__)  
     assert(np.allclose(x.todense(),y.todense()))

產量

% python -mtimeit -s'import test' 'test.using_mm()' 
10 loops, best of 3: 380 msec per loop 

% python -mtimeit -s'import test' 'test.using_np_savez()' 
10 loops, best of 3: 116 msec per loop

來源

2011-06-08 18:53:12 unutbu

+1，'scipy.io'是正確的解決方案。我想補充一點，如果你想要走優化之路，你可以考慮'numpy.load（mmap_mode ='r'/'c'）'。內存映射磁盤上的文件會給即時加載**和**節省內存，因爲可以在多個進程間共享相同的內存映射陣列。 – Radim 2011-07-19 21:07:57

scipy.io.savemat可能是最好的 – mathtick 2013-03-27 15:11:10

使用np_savez代替mm將大稀疏矩陣的加載時間從8min47減少到3s！謝謝！我也試過savez_compressed，但大小是一樣的，加載時間更長。 – MatthieuBizien 2014-03-01 02:38:03

一個可以提取使用（）作爲索引隱藏在0d數組中的對象：

A = sp.load("my_array")[()]

這看起來很奇怪，但它似乎無論如何工作，它是一個非常短的解決方法。

來源

2015-03-25 16:48:44 user4713166

我很確定你也可以使用.item（），但不要引用我:) – David 2017-05-04 16:44:31

對於mmwrite答案的所有投票，我很驚訝沒有人試圖回答實際問題。但由於它已被重新激活，我會試一試。

這再現了OP的情況：

In [90]: x=sparse.csr_matrix(np.arange(10).reshape(2,5)) 
In [91]: np.save('save_sparse.npy',x) 
In [92]: X=np.load('save_sparse.npy') 
In [95]: X 
Out[95]: 
array(<2x5 sparse matrix of type '<type 'numpy.int32'>' 
    with 9 stored elements in Compressed Sparse Row format>, dtype=object) 
In [96]: X[()].A 
Out[96]: 
array([[0, 1, 2, 3, 4], 
     [5, 6, 7, 8, 9]]) 

In [93]: X[()].A 
Out[93]: 
array([[0, 1, 2, 3, 4], 
     [5, 6, 7, 8, 9]]) 
In [94]: x 
Out[94]: 
<2x5 sparse matrix of type '<type 'numpy.int32'>' 
    with 9 stored elements in Compressed Sparse Row format

的[()]是`user4713166給我們不是一個「硬辦法」提取稀疏數組。

np.save和np.load被設計爲在ndarrays上運行。但是稀疏矩陣不是這樣一個數組，也不是一個子類（如np.matrix）。看起來np.save將非數組對象封裝在object dtype array中，並將其與對象的pickle形式一起保存。

當我試圖保存不同類型的對象，一個不能被酸洗，我得到一個錯誤信息在：

403 # We contain Python objects so we cannot write out the data directly. 
404 # Instead, we will pickle it out with version 2 of the pickle protocol.

- > 405和pickle.dump（陣列，FP，協議= 2）

所以在回答Is Scipy smart enough to understand that it has loaded a sparse array?時，沒有。 np.load不知道稀疏數組。但np.save足夠聰明，可以在給定不是數組的情況下進行遊戲，並且np.load可以在文件中發現它時做到這一點。

至於保存和加載稀疏數組的替代方法，io.savemat，兼容MATLAB的方法已被提及。這將是我的第一選擇。但是這個例子也表明你可以使用常規的Python pickling。如果您需要保存特定的稀疏格式，這可能會更好。如果您能接受[()]提取步驟，那麼np.save並不差。 :)

https://github.com/scipy/scipy/blob/master/scipy/io/matlab/mio5.py write_sparse - 稀疏保存在csc格式。與標題一起，它節省了A.indices.astype('i4')),A.indptr.astype('i4')),A.data.real，以及可選的A.data.imag。

在快速測試我發現np.save/load處理所有稀疏格式，除了dok，其中load報告缺少shape。否則，我在稀疏文件中找不到任何特殊的酸洗代碼。

來源

2015-03-25 22:22:52 hpaulj

從npy文件加載稀疏數組

回答

相關問題