如何使用SciPy CSR Sparse Arrays將一個陣列與另一個陣列進行索引？

我有兩個數組A和B.在NumPy中，您可以使用A作爲B的索引。如何使用SciPy CSR Sparse Arrays將一個陣列與另一個陣列進行索引？

A = np.array([[1,2,3,1,7,3,1,2,3],[4,5,6,4,5,6,4,5,6],[7,8,9,7,8,9,7,8,9]]) 
B= np.array([1,2,3,4,5,6,7,8,9,0]) 
c = B[A]

主要生產：

[[2 3 4 2 8 4 2 3 4] [5 6 7 5 6 7 5 6 7] [8 9 0 8 9 0 8 9 0]]

然而，在我的情況陣列A和B是SciPy的CSR稀疏數組，他們似乎並不支持索引。

A_sparse = sparse.csr_matrix(A) 
B_sparse = sparse.csr_matrix(B) 
c = B_sparse[A_sparse]

這導致：我想出了下面的功能來複制與稀疏矩陣NumPy的行爲

IndexError: Indexing with sparse matrices is not supported except boolean indexing where matrix and index are equal shapes.

：

def index_sparse(A,B):  
     A_sparse = scipy.sparse.coo_matrix(A) 
     B_sparse = sparse.csr_matrix(B) 
     res = sparse.csr_matrix(A_sparse) 
     for i,j,v in zip(A_sparse.row, A_sparse.col, A_sparse.data): 
      res[i,j] = B_sparse[0, v] 
     return res 

res = index_sparse(A, B) 
print res.todense()

循環數組和具有與以在Python中創建一個新數組並不理想。使用SciPy/NumPy的內置函數是否有更好的方法？

來源

2016-08-04 Mark

稀疏索引較不發達。例如coo格式根本沒有實現它。

我還沒有試圖實現這個問題，但我已經回答了其他涉及稀疏格式屬性的工作。所以我只會做一些一般性的觀察。

B_sparse是一個矩陣，所以它的形狀是(1,10)。所以相當於B[A]是

In [294]: B_sparse[0,A] 
Out[294]: 
<3x9 sparse matrix of type '<class 'numpy.int32'>' 
    with 24 stored elements in Compressed Sparse Row format> 
In [295]: _.A 
Out[295]: 
array([[2, 3, 4, 2, 8, 4, 2, 3, 4], 
     [5, 6, 7, 5, 6, 7, 5, 6, 7], 
     [8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)

B_sparse[A,:]或B_sparse[:,A]給出了一個3D的警告，因爲它試圖創建的矩陣版本：

In [298]: B[None,:][:,A] 
Out[298]: 
array([[[2, 3, 4, 2, 8, 4, 2, 3, 4], 
     [5, 6, 7, 5, 6, 7, 5, 6, 7], 
     [8, 9, 0, 8, 9, 0, 8, 9, 0]]])

至於你的函數：

A_sparse.nonzero()是否A_sparse.tocoo()並返回其row和col。和你所做的一樣有效。

這裏的東西，應該是更快的，雖然我沒有測試它足夠，以確保它是強大的：

In [342]: Ac=A_sparse.tocoo() 
In [343]: res=Ac.copy() 
In [344]: res.data[:]=B_sparse[0, Ac.data].A[0] 
In [345]: res 
Out[345]: 
<3x9 sparse matrix of type '<class 'numpy.int32'>' 
    with 27 stored elements in COOrdinate format> 
In [346]: res.A 
Out[346]: 
array([[2, 3, 4, 2, 8, 4, 2, 3, 4], 
     [5, 6, 7, 5, 6, 7, 5, 6, 7], 
     [8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)

在這個例子中有2個0，可以清理以及（看res.nonzero() ）。

因爲你是從Ac.row和Ac.col設置每個res[i,j]與價值觀，res具有相同的row,col值Ac，所以我初始化它作爲一個副本。然後，這只是更新res.data屬性的問題。直接索引Bc.data會更快，但這並不代表其稀疏性。

來源

2016-08-04 20:53:45 hpaulj

如何使用SciPy CSR Sparse Arrays將一個陣列與另一個陣列進行索引？

回答

相關問題