2010-10-06 48 views
0

我有一個numpy的陣列時,楠值誤差:NumPy的 - 爲什麼試圖刪除行

A = array([['id1', '1', '2', 'NaN'], 
      ['id2', '2', '0', 'NaN']]) 

我也有一個列表:

li = ['id1', 'id3', 'id6'] 

我想遍歷數組和過列表以及數組每行中的第一個元素不在列表中的位置,然後從數組中刪除整行。

我的代碼至今:

from numpy import * 

for row in A: 
    if row[0] not in li: 
     delete(A, row, axis = 0) 

這將返回以下錯誤:

ValueError: invalid literal for int() with base 10: 'NaN' 

類型各行中的所有元素是STR(),所以我不明白這個提int()在錯誤中。

有什麼建議嗎?

感謝, 小號;-)

回答

5

就產生一個新的數組是沒辦法?

numpy.array([x for x in A if x[0] in li]) 
+0

是的,比我的解決方案簡單得多! – eumiro 2010-10-06 14:19:13

+0

我認爲原始的海報想要保留'row [0]'在'li'中的行,需要從列表理解中的條件中消除'not'。 – dtlussier 2010-10-06 15:29:33

+0

@dtlussier:謝謝你指出我的錯誤。 :) – atomocopter 2010-10-06 21:24:01

2

看樣子你要刪除陣列就地,然而,這是不可能使用np.delete功能,作爲這樣的操作違背了Python和NumPy的管理方式的一排記憶。

我發現numpy的郵件列表上一個有趣的帖子(Travis Oliphant, [Numpy-discussion] Deleting a row from a matrix)先被討論的np.delete功能:

So, "in-place" deletion of array objects would not be particularly useful, because it would only work for arrays with no additional reference counts (i.e. simple b=a assignment would increase the reference count and make it impossible to say del a[obj]).

....

But, the problem with both of those approaches is that once you start removing arbitrary rows (or n-1 dimensional sub-spaces) from an array you very likely will no longer have a chunk of memory that can be described using the n-dimensional array memory model.

如果你看一看的np.deletehttp://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html)的文件中,我們可以看到,該函數返回一個新的數組,其中刪除了所需的部分(不一定是行)。

Definition:  np.delete(arr, obj, axis=None) 
Docstring: 
Return a new array with sub-arrays along an axis deleted. 

Parameters 
---------- 
arr : array_like 
    Input array. 
obj : slice, int or array of ints 
    Indicate which sub-arrays to remove. 
axis : int, optional 
    The axis along which to delete the subarray defined by `obj`. 
    If `axis` is None, `obj` is applied to the flattened array. 

Returns 
------- 
out : ndarray 
    A copy of `arr` with the elements specified by `obj` removed. Note 
    that `delete` does not occur in-place. If `axis` is None, `out` is 
    a flattened array. 

所以,你的情況我想你會想要做的事,如:

A = array([['id1', '1', '2', 'NaN'], 
      ['id2', '2', '0', 'NaN']]) 

li = ['id1', 'id3', 'id6'] 

for i, row in enumerate(A): 
    if row[0] not in li: 
     A = np.delete(A, i, axis=0) 

A現在是削減下來,你想要的,但要記住這是一個新的內存塊。每次調用np.delete被稱爲新內存分配名稱A將指向。

我敢肯定,有一個更好的矢量化的方式(也許使用屏蔽數組?)找出要刪除的行,但我不能把它們放在一起。如果有人有,但請評論!

相關問題