2011-04-23 61 views
3

假設我有一個100個元素的numpy數組。我對這個數組的子集執行一些計算 - 可能有20個元素滿足某些條件。然後我在這個子集中選擇一個索引,我如何(有效)恢復第一個數組中的索引?我不想對a中的所有值執行計算,因爲它很昂貴,所以我只想在需要的地方執行它(滿足條件)。Python/Numpy - 從子集中獲取索引到主數組

下面是一些僞代碼展示了我的意思(以下簡稱「條件」這裏是列表理解):

a = np.arange(100)         # size = 100 
b = some_function(a[[i for i in range(0,100,5)]]) # size = 20 
Index = np.argmax(b) 

# Index gives the index of the maximum value in b, 
# but what I really want is the index of the element 
# in a 

編輯:

我不是很清楚,所以我已經提供了一個更完整的例子。我希望這能更清楚地說明我的目標是什麼。我覺得有一些聰明和有效的方法來做到這一點,沒有一些循環或查找。

CODE:

import numpy as np 

def some_function(arr): 
    return arr*2.0 

a = np.arange(100)*2.        # size = 100 
b = some_function(a[[i for i in range(0,100,5)]]) # size = 20 
Index = np.argmax(b) 

print Index 
# Index gives the index of the maximum value in b, but what I really want is 
# the index of the element in a 

# In this specific case, Index will be 19. So b[19] is the largest value 
# in b. Now, what I REALLY want is the index in a. In this case, that would 
# 95 because some_function(a[95]) is what made the largest value in b. 
print b[Index] 
print some_function(a[95]) 

# It is important to note that I do NOT want to change a. I will perform 
# several calculations on SOME values of a, then return the indices of 'a' where 
# all calculations meet some condition. 
+0

不知道我是否理解你的問題,但是你想找出第二步計算的20個指數嗎?最終的「索引」與它有什麼關係? – ejel 2011-04-23 02:23:43

+0

@ejel:試圖解釋它,讓我們假設some_function忽略輸入數組,只是返回一個與輸入長度相同的隨機整數數組。然後'Index'將包含具有最大(隨機)數字的b中的索引。 b中的索引確實對應於a中的某個索引,而a中的索引是我想要的。 – 2011-04-23 03:47:13

回答

2

我不知道如果我理解你的問題。所以,如果我錯了,請糾正我。

比方說,你有什麼樣

a = np.arange(100) 
condition = (a % 5 == 0) & (a % 7 == 0) 
b = a[condition] 
index = np.argmax(b) 
# The following should do what you want 
a[condition][index] 

或者,如果你不想用口罩的工作:

a = np.arange(100) 
b_indices = np.where(a % 5 == 0) 
b = a[b_indices] 
index = np.argmax(b) 
# Get the value of 'a' corresponding to 'index' 
a[b_indices][index] 

這是你想要的嗎?

+0

我更新了我的問題,使其更加清晰。在你的代碼中,'a [condition] [index]'返回a中的值,但我希望INDEX在a中,這樣'a [INDEX] = a [condition] [index]'。有沒有一種簡單的方法來從條件和索引中獲取INDEX?我想是有的,但對我來說並不明顯。 – 2011-04-23 04:02:53

+2

'np.arange(len(a))[condition] [index]'也許? – 2011-04-23 04:31:14

+0

工作,謝謝。 – 2011-04-26 04:02:44

0

通常你會存儲基於狀態的指標所做的陣列進行任何更改之前。您可以使用索引進行更改。

如果a是你的數組:

>>> a = np.random.random((10,5)) 
>>> a 
array([[ 0.22481885, 0.80522855, 0.1081426 , 0.42528799, 0.64471832], 
     [ 0.28044374, 0.16202575, 0.4023426 , 0.25480368, 0.87047212], 
     [ 0.84764143, 0.30580141, 0.16324907, 0.20751965, 0.15903343], 
     [ 0.55861168, 0.64368466, 0.67676172, 0.67871825, 0.01849056], 
     [ 0.90980614, 0.95897292, 0.15649259, 0.39134528, 0.96317126], 
     [ 0.20172827, 0.9815932 , 0.85661944, 0.23273944, 0.86819205], 
     [ 0.98363954, 0.00219531, 0.91348196, 0.38197302, 0.16002007], 
     [ 0.48069675, 0.46057327, 0.67085243, 0.05212357, 0.44870942], 
     [ 0.7031601 , 0.50889065, 0.30199446, 0.8022497 , 0.82347358], 
     [ 0.57058441, 0.38748261, 0.76947605, 0.48145936, 0.26650583]]) 

而且b是你的子陣:

>>> b = a[2:4,2:7] 
>>> b 
array([[ 0.16324907, 0.20751965, 0.15903343], 
     [ 0.67676172, 0.67871825, 0.01849056]]) 

它可以證明a仍然擁有b數據:

>>> b.base 
array([[ 0.22481885, 0.80522855, 0.1081426 , 0.42528799, 0.64471832], 
     [ 0.28044374, 0.16202575, 0.4023426 , 0.25480368, 0.87047212], 
     [ 0.84764143, 0.30580141, 0.16324907, 0.20751965, 0.15903343], 
     [ 0.55861168, 0.64368466, 0.67676172, 0.67871825, 0.01849056], 
     [ 0.90980614, 0.95897292, 0.15649259, 0.39134528, 0.96317126], 
     [ 0.20172827, 0.9815932 , 0.85661944, 0.23273944, 0.86819205], 
     [ 0.98363954, 0.00219531, 0.91348196, 0.38197302, 0.16002007], 
     [ 0.48069675, 0.46057327, 0.67085243, 0.05212357, 0.44870942], 
     [ 0.7031601 , 0.50889065, 0.30199446, 0.8022497 , 0.82347358], 
     [ 0.57058441, 0.38748261, 0.76947605, 0.48145936, 0.26650583]]) 

您可以對進行更改個b在兩個方面:

>>> b+=1 
>>> b 
array([[ 1.16324907, 1.20751965, 1.15903343], 
     [ 1.67676172, 1.67871825, 1.01849056]]) 
>>> a 
array([[ 0.22481885, 0.80522855, 0.1081426 , 0.42528799, 0.64471832], 
     [ 0.28044374, 0.16202575, 0.4023426 , 0.25480368, 0.87047212], 
     [ 0.84764143, 0.30580141, 1.16324907, 1.20751965, 1.15903343], 
     [ 0.55861168, 0.64368466, 1.67676172, 1.67871825, 1.01849056], 
     [ 0.90980614, 0.95897292, 0.15649259, 0.39134528, 0.96317126], 
     [ 0.20172827, 0.9815932 , 0.85661944, 0.23273944, 0.86819205], 
     [ 0.98363954, 0.00219531, 0.91348196, 0.38197302, 0.16002007], 
     [ 0.48069675, 0.46057327, 0.67085243, 0.05212357, 0.44870942], 
     [ 0.7031601 , 0.50889065, 0.30199446, 0.8022497 , 0.82347358], 
     [ 0.57058441, 0.38748261, 0.76947605, 0.48145936, 0.26650583]]) 

或者:

>>> a[2:4,2:7]+=1 
>>> a 
array([[ 0.22481885, 0.80522855, 0.1081426 , 0.42528799, 0.64471832], 
     [ 0.28044374, 0.16202575, 0.4023426 , 0.25480368, 0.87047212], 
     [ 0.84764143, 0.30580141, 1.16324907, 1.20751965, 1.15903343], 
     [ 0.55861168, 0.64368466, 1.67676172, 1.67871825, 1.01849056], 
     [ 0.90980614, 0.95897292, 0.15649259, 0.39134528, 0.96317126], 
     [ 0.20172827, 0.9815932 , 0.85661944, 0.23273944, 0.86819205], 
     [ 0.98363954, 0.00219531, 0.91348196, 0.38197302, 0.16002007], 
     [ 0.48069675, 0.46057327, 0.67085243, 0.05212357, 0.44870942], 
     [ 0.7031601 , 0.50889065, 0.30199446, 0.8022497 , 0.82347358], 
     [ 0.57058441, 0.38748261, 0.76947605, 0.48145936, 0.26650583]]) 
>>> b 
array([[ 1.16324907, 1.20751965, 1.15903343], 
     [ 1.67676172, 1.67871825, 1.01849056]]) 

兩者是等價的,也不是比其他更昂貴。因此,只要您保留從a創建b的索引,就可以隨時查看基本陣列中已更改的數據。在切片上進行操作時,甚至不需要創建子陣列。

編輯

這是假設some_func返回在某個條件爲真子數組的索引。

我認爲,當一個函數返回索引,並且你只想爲該函數提供一個子數組時,你仍然需要存儲該子數組的索引並使用它們來獲取基數組索引。例如:

>>> def some_func(a): 
...  return np.where(a>.8) 
>>> a = np.random.random((10,4)) 
>>> a 
array([[ 0.94495378, 0.55532342, 0.70112911, 0.4385163 ], 
     [ 0.12006191, 0.93091941, 0.85617421, 0.50429453], 
     [ 0.46246102, 0.89810859, 0.31841396, 0.56627419], 
     [ 0.79524739, 0.20768512, 0.39718061, 0.51593312], 
     [ 0.08526902, 0.56109783, 0.00560285, 0.18993636], 
     [ 0.77943988, 0.96168229, 0.10491335, 0.39681643], 
     [ 0.15817781, 0.17227806, 0.17493879, 0.93961027], 
     [ 0.05003535, 0.61873245, 0.55165992, 0.85543841], 
     [ 0.93542227, 0.68104872, 0.84750821, 0.34979704], 
     [ 0.06888627, 0.97947905, 0.08523711, 0.06184216]]) 
>>> i_off, j_off = 3,2 
>>> b = a[i_off:,j_off:] #b 
>>> i = some_func(b) #indicies in b 
>>> i 
(array([3, 4, 5]), array([1, 1, 0])) 
>>> map(sum, zip(i,(i_off, j_off))) # indicies in a 
[array([6, 7, 8]), array([3, 3, 2])] 

編輯2

這假定some_func返回子陣列b的修改拷貝。

你的例子是這個樣子:

import numpy as np 

def some_function(arr): 
    return arr*2.0 

a = np.arange(100)*2.        # size = 100 
idx = np.array(range(0,100,5)) 
b = some_function(a[idx]) # size = 20 
b_idx = np.argmax(b) 
a_idx = idx[b_idx] # indices in a translated from indices in b 

print b_idx, a_idx 
print b[b_idx], a[a_idx] 

assert b[b_idx] == 2* a[a_idx] #true! 
+0

是的,我明白我如何使用這樣的指標。但對於我的特殊應用,這不是我所需要的。可能我的例子並不是最好的。該代碼進入函數,函數需要將索引返回到滿足特定條件的數組中。所以函數可能類似於'def some_function(arr)',它返回arr中符合一系列條件的索引。我不打算改變數組的值。 – 2011-04-23 03:43:52

+0

看我的編輯。我沒有看到任何方式來獲取在其基數組中定位子陣列的索引。那樣就好了。我想你只需要存儲用於創建子數組的索引(基本數組),然後將它們作爲偏移量應用於返回的(子數組)索引。 – Paul 2011-04-23 04:06:37

+0

另一個編輯,看到你的改進後的示例代碼。 – Paul 2011-04-23 04:30:36

0

使用輔助數組a_index,它只是a的元素索引,因此a_index[3,5] = (3,5)。然後你可以得到原始索引a_index[condition == True][Index]

如果您可以保證b是a上的視圖,則可以使用這兩個數組的信息來查找b和a的索引之間的轉換。