2012-06-28 24 views
0

我有一個numpy的數據結構如下:如何一次應用一列功能?

[[['diaad'], 
    ['iaadf'], 
    ['aadfe'], 
    ['hedbb'], 
    ['edbbb'], 
    ['dbbbb']], 

[['gegec'], 
    ['ehecf'], 
    ['gecfc'], 
    ['gadff'], 
    ['adfef'], 
    ['dffgc']], 

[['ddddj'], 
    ['dddjd'], 
    ['ddjdd'], 
    ['jfffd'], 
    ['fgfdb'], 
    ['ggdbb']]] 

其被實例化這樣的:

>>> a = np.array([[['diaad'], ['iaadf'], ['aadfe'], ['hedbb'], ['edbbb'], ['dbbbb']], [['gegec'], ['ehecf'], ['gecfc'], ['gadff'], ['adfef'], ['dffgc']], [['ddddj'], ['dddjd'], ['ddjdd'], ['jfffd'], ['fgfdb'], ['ggdbb']]]) 

有沒有計算過兩兩元素的自定義函數的直接numpy方式?

例如,我的自定義函數被稱爲processPair(a,b)。它應該計算沿列的所有成對元素的結果,即在('diaad', 'gegec'),('gegec', 'ddddj')('diaad', 'ddddj')之間。有關這樣做的任何建議?我認爲map函數可以實現這一點,但我不完全確定。

+0

我會建議使用Pandas DataFrame,這使得應用自定義函數變得簡單。 – root

回答

1

這是我的解決方案。我並不完全滿意,它 - 我覺得它應該可以做更多的優雅 - 但它的工作原理:

from itertools import combinations 

def apply_pairwise(func, a): 
    "For each row, call func with every possible combination of two values" 

    stack = [] 
    for col_a, col_b in combinations(range(a.shape[0]), 2): 
     stack.append(np.hstack([a[col_a], a[col_b]])) 

    combined = np.vstack(stack) 

    def unpack_row(row): 
     "Calls func with the values of a given numpy array as arguments" 
     return func(*row.tolist()) 

    return np.apply_along_axis(unpack_row, 1, combined) 

使用像這樣(假設你的例子陣列a已被定義):

>>> f = lambda x, y: x + y 
>>> print apply_pairwise(f, a) 
['diaadgegec' 'iaadfehecf' 'aadfegecfc' 'hedbbgadff' 'edbbbadfef' 
'dbbbbdffgc' 'diaadddddj' 'iaadfdddjd' 'aadfeddjdd' 'hedbbjfffd' 
'edbbbfgfdb' 'dbbbbggdbb' 'gegecddddj' 'ehecfdddjd' 'gecfcddjdd' 
'gadffjfffd' 'adfeffgfdb' 'dffgcggdbb']