2017-02-06 177 views
2

(編輯:我寫了一個解決方案基礎上hpaulj的回答,請參閱代碼在文章底部)索引使用切片的numpy的數組numpy的陣列

我寫了細分功能的n維將數組排列成較小的數組,使得每個子部分總共具有max_chunk_size個元素。

因爲我需要細分許多相同形狀的數組,然後在相應的塊上執行操作,它實際上不會對數據進行操作,而不會創建「索引器」數組,即i。即一組(slice(x1, x2), slice(y1, y2), ...)對象(請參閱下面的代碼)。有了這些索引器,我可以通過調用the_array[indexer[i]]來檢索細分(請參閱下面的示例)。另外,這些索引器的數組具有與輸入相同的維數,並且分割沿着對應的軸對齊,即,即塊the_array[indexer[i,j,k]]the_array[indexer[i+1,j,k]]沿0軸adjusent等

我期待,我也應該能夠通過調用the_array[indexer[i:i+2,j,k]]來連接這些塊和the_array[indexer]將返回剛剛the_array,然而,這樣的調用導致的錯誤:

IndexError: arrays used as indices must be of integer (or boolean) type

有沒有簡單的方法來解決這個錯誤?

下面的代碼:

import numpy as np 
import itertools 

def subdivide(shape, max_chunk_size=500000): 
    shape = np.array(shape).astype(float) 
    total_size = shape.prod() 

    # calculate maximum slice shape: 
    slice_shape = np.floor(shape * min(max_chunk_size/total_size, 1.0)**(1./len(shape))).astype(int) 

    # create a list of slices for each dimension: 
    slices = [[slice(left, min(right, n)) \ 
     for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \ 
     for n, step_size in zip(shape.astype(int), slice_shape)] 

    result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object) 
    for i, el in enumerate(itertools.product(*slices)): result[i] = el 
    result.shape = np.ceil(shape/slice_shape).astype(int) 
    return result 

下面是一個例子用法:

>>> ar = np.arange(90).reshape(6,15) 
>>> ar 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], 
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], 
     [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]]) 

>>> slices = subdivide(ar.shape, 16) 
>>> slices 
array([[(slice(0, 2, None), slice(0, 6, None)), 
     (slice(0, 2, None), slice(6, 12, None)), 
     (slice(0, 2, None), slice(12, 15, None))], 
     [(slice(2, 4, None), slice(0, 6, None)), 
     (slice(2, 4, None), slice(6, 12, None)), 
     (slice(2, 4, None), slice(12, 15, None))], 
     [(slice(4, 6, None), slice(0, 6, None)), 
     (slice(4, 6, None), slice(6, 12, None)), 
     (slice(4, 6, None), slice(12, 15, None))]], dtype=object) 

>>> ar[slices[1,0]] 
array([[30, 31, 32, 33, 34, 35], 
     [45, 46, 47, 48, 49, 50]]) 
>>> ar[slices[0,2]] 
array([[12, 13, 14], 
     [27, 28, 29]]) 
>>> ar[slices[2,1]] 
array([[66, 67, 68, 69, 70, 71], 
     [81, 82, 83, 84, 85, 86]]) 

>>> ar[slices[:2,1:3]] 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
IndexError: arrays used as indices must be of integer (or boolean) type 

下面是基於hpaulj的回答的溶液:

import numpy as np 
import itertools 

class Subdivision(): 
    def __init__(self, shape, max_chunk_size=500000): 
     shape = np.array(shape).astype(float) 
     total_size = shape.prod() 

     # calculate maximum slice shape: 
     slice_shape = np.floor(shape * min(max_chunk_size/total_size, 1.0)**(1./len(shape))).astype(int) 

     # create a list of slices for each dimension: 
     slices = [[slice(left, min(right, n)) \ 
      for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \ 
      for n, step_size in zip(shape.astype(int), slice_shape)] 

     self.slices = \ 
      np.array(list(itertools.product(*slices)), \ 
        dtype=np.object).reshape(tuple(np.ceil(shape/slice_shape).astype(int)) + (len(shape),)) 

    def __getitem__(self, args): 
     if type(args) != tuple: args = (args,) 

     # turn integer index into equivalent slice 
     args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args) 

     # select the slices 
     # always select all elements from the last axis (which contains slices for each data dimension) 
     slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))] 

     return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \ 
                 [0] * (len(slices.shape) - 2 - i) + [i])])] \ 
           for i in range(len(slices.shape) - 1))) 

實例:

>>> ar = np.arange(90).reshape(6,15) 
>>> ar 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], 
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], 
     [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]]) 

>>> subdiv = Subdivision(ar.shape, 16) 
>>> ar[subdiv[...]] 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], 
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], 
     [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]]) 

>>> ar[subdiv[0]] 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]]) 

>>> ar[subdiv[:2,1]] 
array([[ 6, 7, 8, 9, 10, 11], 
     [21, 22, 23, 24, 25, 26], 
     [36, 37, 38, 39, 40, 41], 
     [51, 52, 53, 54, 55, 56]]) 

>>> ar[subdiv[2,:3]] 
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]]) 

>>> ar[subdiv[...,:2]] 
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 
     [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26], 
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41], 
     [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56], 
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], 
     [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]]) 

回答

3

你的切片產生2x6和2x3陣列。我的numpy版本希望我把subslice變成一個元組。這與

ar[slice(0,2), slice(6,12)] 
ar[:2, 6:12] 

這只是索引和切片的基本語法。 ar是2d,因此ar[(i,j)]需要一個2元素元組 - 分片,列表,數組或整數。它不適用於一系列切片對象。

如何將結果連接成更大的數組。這可以在索引之後完成,也可以將切片轉換爲索引列表。

np.bmat例如串接在一起陣列的2D arangement:

In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]], 
        [ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]]) 
Out[42]: 
matrix([[ 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [21, 22, 23, 24, 25, 26, 27, 28, 29], 
     [36, 37, 38, 39, 40, 41, 42, 43, 44], 
     [51, 52, 53, 54, 55, 56, 57, 58, 59]]) 

你可以概括這一點。它只在嵌套列表上使用hstackvstack。結果是np.matrix,但可以轉換回array

另一種方法是使用工具如np.arangenp.r_,np.xi_來創建索引數組。這需要一些遊戲來生成一個例子。

爲了組合[0,0]和[0,1]子切片:

In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]] 
In [65]: i = np.r_[subslice[0,0,0]] 

In [66]: i,j 
Out[66]: (array([0, 1]), array([ 6, 7, 8, 9, 10, 11, 12, 13, 14])) 
In [68]: ix = np.ix_(i,j) 
In [69]: ix 
Out[69]: 
(array([[0], 
     [1]]), array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14]])) 

In [70]: ar[ix] 
Out[70]: 
array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14], 
     [21, 22, 23, 24, 25, 26, 27, 28, 29]]) 

或者與i = np.r_[subslice[0,0,0], subslice[1,0,0]]ar[np.ix_(i,j)]產生4×9陣列。

+0

感謝您的回答!我用'np.r_'和'np.xi_'的建議來創建一個類並定義它的'__getitem__'方法來返回所需的索引數組(參見更新後的OP)。 – SiLiKhon