numpy中的itertools.combinations的N-D版本

我想爲numpy實施itertools.combinations。基於this discussion，我有一維輸入工作的功能：numpy中的itertools.combinations的N-D版本

def combs(a, r): 
    """ 
    Return successive r-length combinations of elements in the array a. 
    Should produce the same output as array(list(combinations(a, r))), but 
    faster. 
    """ 
    a = asarray(a) 
    dt = dtype([('', a.dtype)]*r) 
    b = fromiter(combinations(a, r), dt) 
    return b.view(a.dtype).reshape(-1, r)

和輸出是有道理的：

In [1]: list(combinations([1,2,3], 2)) 
Out[1]: [(1, 2), (1, 3), (2, 3)] 

In [2]: array(list(combinations([1,2,3], 2))) 
Out[2]: 
array([[1, 2], 
     [1, 3], 
     [2, 3]]) 

In [3]: combs([1,2,3], 2) 
Out[3]: 
array([[1, 2], 
     [1, 3], 
     [2, 3]])

但是

，這將是最好的，如果我可以把它擴大到ND的輸入，其中其他維度僅允許您一次快速執行多個呼叫。因此，在概念上，如果combs([1, 2, 3], 2)產生[1, 2], [1, 3], [2, 3]，並且combs([4, 5, 6], 2)產生[4, 5], [4, 6], [5, 6]，則combs((1,2,3) and (4,5,6), 2)應該產生[1, 2], [1, 3], [2, 3] and [4, 5], [4, 6], [5, 6]，其中「和」僅表示並行行或列（無論哪一個是有意義的）。（並且對於其他維度）

我不知道：

如何使尺寸在與方式等功能的工作（如一致的一些numpy的功能如何有一個axis=邏輯工作方式參數和軸0的默認值。那麼可能軸0應該是我合併的軸，而所有其他軸只代表並行計算？）
如何讓上述代碼與ND一起工作（現在我得到ValueError: setting an array element with a sequence.）
有沒有更好的方法來做dt = dtype([('', a.dtype)]*r)？

來源

2013-04-14 endolith

不知道它是如何工作了性能明智的，但你可以索引陣列上做組合，然後用np.take提取實際陣列片：

def combs_nd(a, r, axis=0): 
    a = np.asarray(a) 
    if axis < 0: 
     axis += a.ndim 
    indices = np.arange(a.shape[axis]) 
    dt = np.dtype([('', np.intp)]*r) 
    indices = np.fromiter(combinations(indices, r), dt) 
    indices = indices.view(np.intp).reshape(-1, r) 
    return np.take(a, indices, axis=axis) 

>>> combs_nd([1,2,3], 2) 
array([[1, 2], 
     [1, 3], 
     [2, 3]]) 
>>> combs_nd([[1,2,3],[4,5,6]], 2, axis=1) 
array([[[1, 2], 
     [1, 3], 
     [2, 3]], 

     [[4, 5], 
     [4, 6], 
     [5, 6]]])

來源

2013-04-15 05:52:26 Jaime

因此'np.dtype（[（''，np.intp）] * r）'是創建列表dtype的「正確」方式嗎？我只是有點刺傷它，直到它工作。 – endolith

非常酷！我發現這比@ HYRY的解決方案性能稍差（在速度和內存方面），但它比剛開始使用itertools.combinations更好。 –

您可以使用itertools.combinations()創建索引數組，然後使用NumPy的的花式索引：

import numpy as np 
from itertools import combinations, chain 
from scipy.misc import comb 

def comb_index(n, k): 
    count = comb(n, k, exact=True) 
    index = np.fromiter(chain.from_iterable(combinations(range(n), k)), 
         int, count=count*k) 
    return index.reshape(-1, k) 

data = np.array([[1,2,3,4,5],[10,11,12,13,14]]) 

idx = comb_index(5, 3) 
print data[:, idx]

輸出：

[[[ 1 2 3] 
    [ 1 2 4] 
    [ 1 2 5] 
    [ 1 3 4] 
    [ 1 3 5] 
    [ 1 4 5] 
    [ 2 3 4] 
    [ 2 3 5] 
    [ 2 4 5] 
    [ 3 4 5]] 

[[10 11 12] 
    [10 11 13] 
    [10 11 14] 
    [10 12 13] 
    [10 12 14] 
    [10 13 14] 
    [11 12 13] 
    [11 12 14] 
    [11 13 14] 
    [12 13 14]]]

來源

2013-04-15 06:03:56 HYRY

什麼是'chain.from_iterable'？ – endolith

@endolith：哦，我明白了。它消除了'dt = np.dtype ...'的需要，而且似乎也使這個版本比Jaime更快。 – endolith

numpy中的itertools.combinations的N-D版本

回答

相關問題