陣列串入numpy.amax

在Python的標準max功能（I還可以在一個關鍵的參數傳遞）：陣列串入numpy.amax

s = numpy.array(['one','two','three']) 
max(s) # 'two' (lexicographically last) 
max(s, key=len) # 'three' (longest string)

在更大的（多維）陣列，我不能再使用max ，所以我試圖用numpy.amax，但是我似乎無法能夠使用amax用繩子...

t = np.array([['one','two','three'],['four','five','six']]) 
t.dtype # dtype('|S5') 
numpy.amax(t, axis=0) #Error! Hoping for: [`two`, `six`] 

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 1833, in amax 
     return amax(axis, out) 
TypeError: cannot perform reduce with flexible type

是否有可能使用amax（我錯誤地使用它！），還是有其他一些numpy工具可以做到這一點？

來源

2012-09-29 Andy Hayden

而不是將您的字符串作爲可變長度數據存儲在numpy數組中，您可以嘗試將它們存儲爲Python object s。 NumPy的將視這些爲原來的Python字符串對象的引用，然後就可以像對待你所期望的：

t = np.array([['one','two','three'],['four','five','six']], dtype=object) 
np.min(t) 
# gives 'five' 
np.max(t) 
# gives 'two'

請這裏的np.min和np.max呼叫字典順序排序的字符串心靈 - 如此「兩個「確實是在」五「之後出現的。要更改比較運算符以查看每個字符串的長度，可以嘗試創建一個新的形式相同的numpy數組，但包含每個字符串的長度而不是其引用。然後，您可以對該數組執行一次numpy.argmin調用（返回最小索引），並查找原始數組中的字符串值。

示例代碼：

# Vectorize takes a Python function and converts it into a Numpy 
# vector function that operates on arrays 
np_len = np.vectorize(lambda x: len(x)) 

np_len(t) 
# gives array([[3, 3, 5], [4, 4, 3]]) 

idx = np_len(t).argmin(0) # get the index along the 0th axis 
# gives array([0, 0, 1]) 

result = t 
for i in idx[1:]: 
    result = result[i] 
print result 
# gives "two", the string with the smallest length

來源

2012-09-29 15:48:27

有一個原因'D型= '| S5''是默認的，而不是'' object''？（我曾認爲這是問題:)）。似乎爲每個'key'複製't'會創建很多其他數組，特別是當這些數組很大時，這似乎是一個間接的解決方案... –

當您創建一個'numpy'數組的字符串時，每個字符都被看作是一個字面的'numpy'字符串 - 只是一些連續的字節。由於'numpy'數組必須（或力爭）對每個對象具有相同的大小，因此在這種情況下默認爲「| S5' - 長度爲5的字符串 - 這是您輸入中最長的字符串。 –

如果輸入數組很大，那麼...是的，這是一個間接的解決方案。請記住，運行'np_len（t）.argmin（0）'不會保存中間數組，儘管它仍然需要Python遍歷解釋器中的每個元素。 –

陣列串入numpy.amax

回答

相關問題