由索引

一個numpy的陣列的累積求和假設有將需要被一起求和由索引

d = [1,1,1,1,1]

和第二陣列指定哪些元素需要被一起求和

i = [0,0,1,2,2]

值的陣列

結果將被存儲在一個大小爲max(i)+1的新數組中。因此，例如i=[0,0,0,0,0]等同於將d的所有元素相加並將結果存儲在1大小的新陣列的位置0處。

我試圖實現這個使用

c = zeros(max(i)+1) 
c[i] += d

然而，+=操作可將每個元素只有一次，從而使的

[1,1,1]

，而不是

[2,1,2]

意想不到的結果如何正確實施這種總結？

來源

2010-08-31 dzhelil

這將是一個更加清晰，如果'D'的價值是獨一無二的。例如，如果'd = [0,1,2,3,4]'我猜'i = [0,0,0,0,0]'你想'c = [10]'，while對於'我= [0,0,1,2,2]'你想'c = [1,2,7]'？ – mtrw 2010-08-31 08:20:21

這是正確的。感謝澄清。 – dzhelil 2010-08-31 18:01:58

在這種情況下，juxstapose的解決方案，以及我在評論中建議的更改應該能夠做到。 – mtrw 2010-08-31 19:53:17

該溶液應該是大的陣列更有效（它遍歷可能的索引值，而不是的i中的各個條目）：

import numpy as np 

i = np.array([0,0,1,2,2]) 
d = np.array([0,1,2,3,4]) 

i_max = i.max() 
c = np.empty(i_max+1) 
for j in range(i_max+1): 
    c[j] = d[i==j].sum() 

print c 
[1. 2. 7.]

來源

2010-09-02 15:42:08 pberkes

def zeros(ilen): 
r = [] 
for i in range(0,ilen): 
    r.append(0) 

i_list = [0,0,1,2,2] 
d = [1,1,1,1,1] 
result = zeros(max(i_list)+1) 

for index in i_list: 
    result[index]+=d[index] 

print result

來源

2010-08-31 04:53:55

關閉，但我認爲OP希望'對於didx，ridx枚舉（i_list）：result [ridx] + = d [didx]'。另外，由於標籤包含[numpy]，因此您可以使用'numpy.zeros'。 – mtrw 2010-08-31 08:18:23

如果我正確理解該問題，有一種此快速功能（只要所述數據陣列是1D）

>>> i = np.array([0,0,1,2,2]) 
>>> d = np.array([0,1,2,3,4]) 
>>> np.bincount(i, weights=d) 
array([ 1., 2., 7.])

np.bincount返回一個數組對所有整數範圍（MAX（i））的，即使有些計數零

來源

2010-09-11 01:00:49 user333700

這是此處所述情況的最佳解決方案。對於標籤數組的一般總和，您可以使用scipy.ndimage.sum。此模塊還具有其他有用的功能，如最大值，最小值，平均值，方差，...... – 2013-03-05 15:49:50

Juh_的評論是最有效的解決方案。這裏的工作代碼：

import numpy as np 
import scipy.ndimage as ni 

i = np.array([0,0,1,2,2]) 
d = np.array([0,1,2,3,4]) 

n_indices = i.max() + 1 
print ni.sum(d, i, np.arange(n_indices))

來源

2014-06-17 10:36:15 Noam

在一般情況下，當你想通過標籤來概括子矩陣可以使用下面的代碼

import numpy as np 
from scipy.sparse import coo_matrix 

def labeled_sum1(x, labels): 
    P = coo_matrix((np.ones(x.shape[0]), (labels, np.arange(len(labels))))) 
    res = P.dot(x.reshape((x.shape[0], np.prod(x.shape[1:])))) 
    return res.reshape((res.shape[0],) + x.shape[1:]) 

def labeled_sum2(x, labels): 
    res = np.empty((np.max(labels) + 1,) + x.shape[1:], x.dtype) 
    for i in np.ndindex(x.shape[1:]): 
     res[(...,)+i] = np.bincount(labels, x[(...,)+i]) 
    return res

第一種方法是使用稀疏矩陣乘法。第二個是user333700答案的概括。這兩種方法都有相當的速度：

x = np.random.randn(100000, 10, 10) 
labels = np.random.randint(0, 1000, 100000) 
%time res1 = labeled_sum1(x, labels) 
%time res2 = labeled_sum2(x, labels) 
np.all(res1 == res2)

輸出：

Wall time: 73.2 ms 
Wall time: 68.9 ms 
True

來源

2015-06-02 10:40:32 ybeltukov

回答

相關問題