維度不可知（通用）笛卡爾積

我正在尋找生成相對大量數組的笛卡爾乘積以跨越高維網格。由於維數高，因此不可能將笛卡兒積計算的結果存儲在內存中;而是將其寫入硬盤。由於這個約束，我需要訪問生成的中間結果。我到目前爲止一直在做的是這樣的：維度不可知（通用）笛卡爾積

for x in xrange(0, 10): 
    for y in xrange(0, 10): 
     for z in xrange(0, 10): 
      writeToHdd(x,y,z)

其中，除了是很討厭的，是不可擴展（即，它會要求我寫的許多循環的尺寸）。我曾嘗試使用here提出的解決方案，但這是一個遞歸解決方案，因此很難在生成時即時獲取結果。除了每個維度有一個硬編碼循環之外，是否有任何「簡潔」的方式來做到這一點？

來源

2014-07-09 danielvdende

在普通的Python中，您可以使用itertools.product來生成迭代集合的笛卡爾乘積。

>>> arrays = range(0, 2), range(4, 6), range(8, 10) 
>>> list(itertools.product(*arrays)) 
[(0, 4, 8), (0, 4, 9), (0, 5, 8), (0, 5, 9), (1, 4, 8), (1, 4, 9), (1, 5, 8), (1, 5, 9)]

在numpy的，你可以結合numpy.meshgrid（通過sparse=True，以避免擴大在內存中的產品）與numpy.ndindex：

>>> arrays = np.arange(0, 2), np.arange(4, 6), np.arange(8, 10) 
>>> grid = np.meshgrid(*arrays, sparse=True) 
>>> [tuple(g[i] for g in grid) for i in np.ndindex(grid[0].shape)] 
[(0, 4, 8), (0, 4, 9), (1, 4, 8), (1, 4, 9), (0, 5, 8), (0, 5, 9), (1, 5, 8), (1, 5, 9)]

來源

2014-07-09 11:47:17

偉大的解決方案！ – Jake0x32

看來你只是想循環任意數量的維度。我的通用解決方案是使用索引字段和增量索引加處理溢出。

例子：

n = 3 # number of dimensions 
N = 1 # highest index value per dimension 

idx = [0]*n 
while True: 
    print(idx) 
    # increase first dimension 
    idx[0] += 1 
    # handle overflows 
    for i in range(0, n-1): 
     if idx[i] > N: 
      # reset this dimension and increase next higher dimension 
      idx[i] = 0 
      idx[i+1] += 1 
    if idx[-1] > N: 
     # overflow in the last dimension, we are finished 
     break

給出：

[0, 0, 0] 
[1, 0, 0] 
[0, 1, 0] 
[1, 1, 0] 
[0, 0, 1] 
[1, 0, 1] 
[0, 1, 1] 
[1, 1, 1]

numpy的有類似的東西內置：ndenumerate。

來源

2014-07-09 08:03:21 Trilarion

謝謝！事後看起來這麼簡單，只是無法弄清楚！ – danielvdende

Gareth Rees的答案具有以下優點：它可以立即使用任意範圍，並使用內置功能。我更喜歡它，在簡單的情況下，我的答案也可以做到這一點。 – Trilarion

我想我找到了使用內存映射文件的一個好方法：

def carthesian_product_mmap(vectors, filename, mode='w+'): 
    ''' 
    Vectors should be a tuple of `numpy.ndarray` vectors. You could 
    also make it more flexible, and include some error checking 
    '''   
    # Make a meshgrid with `copy=False` to create views 
    grids = np.meshgrid(*vectors, copy=False, indexing='ij') 

    # The shape for concatenating the grids from meshgrid 
    shape = grid[0].shape + (len(vectors),) 

    # Find the "highest" dtype neccesary 
    dtype = np.result_type(*vectors) 

    # Instantiate the memory mapped file 
    M = np.memmap(filename, dtype, mode, shape=shape) 

    # Fill the memmap with the grids 
    for i, grid in enumerate(grids): 
     M[...,i] = grid 

    # Make sure the data is written to disk (optional?) 
    M.flush() 

    # Reshape to put it in the right format for Carthesian product 
    return M.reshape((-1, len(vectors)))

但我想知道你是否真的需要存儲整個Carthesian產品（有很多數據重複）。在需要時產品中的行是不是可以選擇的？

來源

2014-07-09 12:17:09

你也可以通過廣播完成任務，參見http://stackoverflow.com/a/11146645/2379410 –

只是出於好奇; Numpy的memmap會勝過基於數據庫的解決方案嗎？我想數據庫會有一些開銷來建立連接等，但我會認爲數據庫提供'智能'索引/壓縮系統等？ – danielvdende

@danielvdende，我沒有數據庫經驗..也許當磁盤IO是瓶頸時，它可能與Numpy同樣快 –

維度不可知（通用）笛卡爾積

回答

相關問題