NumPy的索引：返回numpy的索引其餘

一個簡單的例子：NumPy的索引：返回numpy的索引其餘

In: a = numpy.arange(10) 
In: sel_id = numpy.arange(5) 
In: a[sel_id] 
Out: array([0,1,2,3,4])

如何退還未通過sel_id索引的陣列的休息嗎？我能想到的是：

In: numpy.array([x for x in a if x not in a[id]]) 
out: array([5,6,7,8,9])

有沒有更簡單的方法？

來源

2012-09-20 CJLam

這是一次性操作嗎？或者你會在路上重複使用'sel_id'（它是否定的）？另外，您是否對多維案例感興趣，或僅僅是一維案例？ – mgilson

在我的應用程序中，它將在多維海量數組上運行，是的，我將重用sel_id。 – CJLam

剛剛意識到我上面的解決方案是錯誤的。如果它是一個十個1的數組，那麼給定的代碼將給出一個無數組而不是一個五個1的數組。 – CJLam

對於這個簡單的一維情況下，我會實際使用的布爾面膜：

a = numpy.arange(10) 
include_index = numpy.arange(4) 
include_idx = set(include_index) #Set is more efficient, but doesn't reorder your elements if that is desireable 
mask = numpy.array([(i in include_idx) for i in xrange(len(a))])

現在，你可以得到你的價值觀：

included = a[mask] # array([0, 1, 2, 3]) 
excluded = a[~mask] # array([4, 5, 6, 7, 8, 9])

注意a[mask]不一定產生相同東西爲a[include_index]，因爲include_index的順序對於該場景中的輸出很重要（應該大致相當於a[sorted(include_index)]）。但是，由於排除的項目的順序沒有明確定義，所以應該可以正常工作。

編輯

一種更好的方式來創建蒙是：

mask = np.zeros(a.shape,dtype=bool) 
mask[include_idx] = True

（感謝seberg）。

來源

2012-09-20 18:17:28 mgilson

我很高興解決這個問題，如果您選擇留下評論，說明這個問題可能有什麼問題。 – mgilson

@BiRico - 錯了。我將'include_index'轉換爲'set'（稱爲'include_idx'），它有一個'__contains__'方法進入O（1）。該解決方案具有「O（N）」複雜性。 – mgilson

+1，這幾乎就是我想要建議的，但我必須離開計算機。對於像這樣的操作，使用布爾掩碼是很好的，因爲您無需執行任何額外工作來計算相對補數。根據我的測試，只用fyi，在發生器上使用'fromiter'而不是'array'就能產生小的速度提升。 – senderle

-1

numpy.setdiff1d(a, a[sel_id])應該這樣做。不知道有沒有比這更簡單的東西。

來源

2012-09-20 17:55:30 Harel

如果數組中有重複的值，那麼這就不會起作用。 – reptilicus

它更像：

a = numpy.array([1, 2, 3, 4, 5, 6, 7, 4]) 
exclude_index = numpy.arange(5) 
include_index = numpy.setdiff1d(numpy.arange(len(a)), exclude_index) 
a[include_index] 
# array([6, 7, 4]) 

# Notice this is a little different from 
numpy.setdiff1d(a, a[exclude_index]) 
# array([6, 7]

來源

2012-09-20 18:07:12

另外，如果它們是連續的使用[N：]語法選擇的其餘部分。例如，arr [5：]會選擇數組中倒數第5個元素。

來源

2012-09-20 18:17:15 reptilicus

如果他們不是？ –

你可以用布爾面具很好地做到這一點：

a = numpy.arange(10) 

mask = np.ones(len(a), dtype=bool) # all elements included/True. 
mask[[7,2,8]] = False    # Set unwanted elements to False 

print a[mask] 
# Gives (removing entries 7, 2 and 8): 
[0 1 3 4 5 6 9]

加成（從@mgilson拍攝）。創建的二進制掩碼可以很好地用於返回原始切片a[~mask]但是，如果原始索引是，則排序爲。

編輯：向下移動，我不得不承認我會在這個時候考慮np.delete越野車（2012年9月）。

你也可以使用np.delete，雖然面具更強大（並且將來我認爲這應該是一個好的選擇）。然而，目前它的速度比上述速度慢，並且會產生意想不到的結果，並帶有負指數（或給定切片時的步數）。

print np.delete(a, [7,2,8])

來源

2012-09-20 20:33:19 seberg

是的 - 第二種方法是迄今爲止最好的，唯一純粹的線性評估......回顧起來似乎很明顯！（請注意，在幕後，'numpy.delete'只使用['setdiff1d']]（https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L3380），然後使用['in1d']（https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py#L384）。所以它也是n log n。）+1，但你已經有了我的！ – senderle

@senderle認真！這很有趣，也許'np.delete'可以使用該執行路徑的更改... – seberg

-1用於'np.delete'的使用。 **從不**好主意。 –

-1

假設a是一維數組，你可以只彈出你不想要的物品，從指數的列表：

accept = [i for i in range(a.size) if i not in avoid_list] 
a[accept]

您也可以嘗試使用類似

accept = sorted(set(range(a.size)) - set(indices_to_discard)) 
a[accept]

這個想法是對你不想要的一組索引的補充使用奇特的索引。

來源

2012-09-20 21:21:32

我會做一個布爾面具，但有點不同。它具有在N維中工作的優點，具有連續或不連續的指數。內存使用情況將取決於是否爲掩碼數組創建視圖或副本，但我不確定。

import numpy 
a = numpy.arange(10) 
sel_id = numpy.arange(5) 
mask = numpy.ma.make_mask_none(a.shape) 
mask[sel_id] = True 
answer = numpy.ma.masked_array(a, mask).compressed() 
print answer 
# [5 6 7 8 9]

來源

2012-09-20 22:24:32

屏蔽數組可能是一個非常好的選擇。儘管'.compressed（）有點擊敗了被屏蔽的數組目的IMO，因爲它創建了一個正常的數組副本。 – seberg

同意，但問題是要獲得數組的其餘部分... –

NumPy的索引：返回numpy的索引其餘

回答

相關問題