矢量化一個numpy的切片操作

說我有一個numpy的載體，矢量化一個numpy的切片操作

A = zeros(100)

，我把它分成子向量通過斷點該指數的列表爲A，例如，

breaks = linspace(0, 100, 11, dtype=int)

所以i - 子向量將位於索引breaks[i]（含）和breaks[i+1]（不含）之間。中斷不一定是等間隔的，這只是一個例子。但是，他們會一直嚴格增加。

現在我想操作這些子向量。舉例來說，如果我想設置i個分矢量來i的所有要素，我可以做：

for i in range(len(breaks) - 1): 
    A[breaks[i] : breaks[i+1]] = i

或者我可能要計算子向量表示：

b = empty(len(breaks) - 1) 
for i in range(len(breaks) - 1): 
    b = A[breaks[i] : breaks[i+1]].mean()

等等。

如何避免使用for循環，而是將這些操作向量化？

來源

2015-04-27 cfh

是'breaks'預排序？ – Divakar

@Divakar：是的，他們正在嚴格增加。 – cfh

另外，中斷的限制是否覆蓋了整個「A」，即是否會有一些A的元素在這個操作之後不會被改變？ – Divakar

對於你的問題確實沒有一個單一的答案，但你可以使用幾種技巧作爲構建模塊。另外一個可能對您有用：

所有numpy的ufuncs有.reduceat方法，你可以用你的優勢爲你的一些計算：

>>> a = np.arange(100) 
>>> breaks = np.linspace(0, 100, 11, dtype=np.intp) 
>>> counts = np.diff(breaks) 
>>> counts 
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]) 
>>> sums = np.add.reduceat(a, breaks[:-1], dtype=np.float) 
>>> sums 
array([ 45., 145., 245., 345., 445., 545., 645., 745., 845., 945.]) 
>>> sums/counts # i.e. the mean 
array([ 4.5, 14.5, 24.5, 34.5, 44.5, 54.5, 64.5, 74.5, 84.5, 94.5])

來源

2015-04-27 13:35:50 Jaime

你可以使用np.repeat：

In [35]: np.repeat(np.arange(0, len(breaks)-1), np.diff(breaks)) 
Out[35]: 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 
     2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 
     4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
     6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 
     9, 9, 9, 9, 9, 9, 9, 9])

要計算任意離散化的統計數據，你可以使用scipy.stats.binned_statistic：

import numpy as np 
import scipy.stats as stats 

breaks = np.linspace(0, 100, 11, dtype=int) 
A = np.random.random(100) 

means, bin_edges, binnumber = stats.binned_statistic(
    x=np.arange(len(A)), values=A, statistic='mean', bins=breaks)

stats.binned_statistic可以計算方式，中位數，計數，款項;或者，計算每個倉的任意統計，你可以傳遞一個可調用的statistic參數：

def func(values): 
    return values.mean() 

funcmeans, bin_edges, binnumber = stats.binned_statistic(
    x=np.arange(len(A)), values=A, statistic=func, bins=breaks) 

assert np.allclose(means, funcmeans)

來源

2015-04-27 11:32:15 unutbu

但是，我現在如何在避開for循環的同時將'i'-th部分設置爲'i'？ – cfh

您可以使用簡單的np.cumsum -

import numpy as np 

# Form zeros array of same size as input array and 
# place ones at positions where intervals change 
A1 = np.zeros_like(A) 
A1[breaks[1:-1]] = 1 

# Perform cumsum along it to create a staircase like array, as the final output 
out = A1.cumsum()

採樣運行 -

In [115]: A 
Out[115]: array([3, 8, 0, 4, 6, 4, 8, 0, 2, 7, 4, 9, 3, 7, 3, 8, 6, 7, 1, 6]) 

In [116]: breaks 
Out[116]: array([ 0, 4, 9, 11, 18, 20]) 

In [142]: out 
Out[142]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4]..)

如果你想有那些子向量的平均值從A，您可以使用np.bincount -

mean_vals = np.bincount(out, weights=A)/np.bincount(out)

如果您正在尋找擴展這個功能，並使用定製函數來代替，你可能想尋找到MATLAB的accumarray等效：accum其源代碼可用here。

來源

2015-04-27 11:41:26 Divakar

我喜歡你的方法;它比我的快。你也可以使用'A1 = np.zeros（break [-1]）'。 – unutbu

@unutbu啊謝謝！很高興知道提示！ – Divakar

這解決了將每個子向量設置爲常量的簡單用例（這是作爲示例的）。例如，如果我想計算每個子向量的平均值？ – cfh

矢量化一個numpy的切片操作

回答

相關問題