你能想到一個很好的方法（可能與itertools）將一個迭代器分成給定大小的塊？在Python中通過塊（n）迭代迭代器？

因此l=[1,2,3,4,5,6,7]與chunks(l,3)成爲一個迭代[1,2,3], [4,5,6], [7]

我能想到的一個小程序來做到這一點的，但不是可能itertools一個很好的方式。

2012-01-24 Gerenuk

@ kindall：由於處理了最後一個塊，這很接近，但不一樣。 –

這個問題稍有不同，因爲這個問題是關於列表的，而這個問題是更一般的迭代器。雖然答案似乎最終是一樣的。 – recursive

@recursive：是的，在完全讀完鏈接線程後，我發現我的答案中的所有內容都已經出現在另一個線程的某處。 –

從該grouper()食譜itertools文檔的recipes來靠近你想要什麼：

def grouper(n, iterable, fillvalue=None): 
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" 
    args = [iter(iterable)] * n 
    return izip_longest(fillvalue=fillvalue, *args)

這將填補了一個填充值最後一個塊，雖然。

，僅適用於序列，但是根據需要也處理的最後塊

一個不太通用的解決方案是

[my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size)]

最後，上表現爲希望的一般迭代器工作的解決方案是

def grouper(n, iterable): 
    it = iter(iterable) 
    while True: 
     chunk = tuple(itertools.islice(it, n)) 
     if not chunk: 
      return 
     yield chunk

來源

2012-01-24 17:48:03

感謝您的所有其他想法！對不起，我錯過了已經討論過這個問題的數字線索。我嘗試過'islice'，但不知何故，我錯過了它確實吸收了迭代器的所需。現在我正在考慮定義一個自定義迭代器類，它提供了各種功能:) – Gerenuk

'如果塊：yield chunk'可以接受嗎？它削減了一條線，並像單個「返回」一樣具有語義。 –

@ barraponto：不，這是不可接受的，因爲你會留下一個無限循環。 –

「簡單勝於複雜」 - 幾行簡單的發電機可以完成這項工作。只需將它放置在一些實用程序模塊中即可：

def grouper (iterable, n): 
    iterable = iter(iterable) 
    count = 0 
    group = [] 
    while True: 
     try: 
      group.append(next(iterable)) 
      count += 1 
      if count % n == 0: 
       yield group 
       group = [] 
     except StopIteration: 
      yield group 
      break

來源

2012-01-24 18:03:11 jsbueno

下面是返回懶塊的示例;如果您想要列表，請使用map(list, chunks(...))。

from itertools import islice, chain 
from collections import deque 

def chunks(items, n): 
    items = iter(items) 
    for first in items: 
     chunk = chain((first,), islice(items, n-1)) 
     yield chunk 
     deque(chunk, 0) 

if __name__ == "__main__": 
    for chunk in map(list, chunks(range(10), 3)): 
     print chunk 

    for i, chunk in enumerate(chunks(range(10), 3)): 
     if i % 2 == 1: 
      print "chunk #%d: %s" % (i, list(chunk)) 
     else: 
      print "skipping #%d" % i

來源

2012-01-24 18:05:09

注意評論它是如何工作的。 – Marcin

一個警告：這個生成器產生的iterables只在請求下一個迭代之前保持有效。當使用例如'list（chunks（range（10），3））'，所有迭代將已經被消耗。 –

我忘了我在哪裏找到了靈感。我已經修改了一些與MSI GUID在Windows註冊表中工作：

def nslice(s, n, truncate=False, reverse=False): 
    """Splits s into n-sized chunks, optionally reversing the chunks.""" 
    assert n > 0 
    while len(s) >= n: 
     if reverse: yield s[:n][::-1] 
     else: yield s[:n] 
     s = s[n:] 
    if len(s) and not truncate: 
     yield s

reverse並不適用於你的問題，但它的東西我有這個功能廣泛使用的。

>>> [i for i in nslice([1,2,3,4,5,6,7], 3)] 
[[1, 2, 3], [4, 5, 6], [7]] 
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True)] 
[[1, 2, 3], [4, 5, 6]] 
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True, reverse=True)] 
[[3, 2, 1], [6, 5, 4]]

來源

2012-01-24 18:09:08

這個答案是接近我開始使用，但不是很：http://stackoverflow.com/a/434349/246801 –

這隻適用於序列，不適用於一般iterables。 –

@SvenMarnach：嗨，斯文，是的，謝謝你，你絕對正確。我看到OP的例子中使用了一個列表（序列）並且掩蓋了問題的措辭，假設它們意味着序列。不過，謝謝你指出。當我看到您的評論時，我並沒有立即明白它們之間的差異，而是從此以後查閱了它。 '：）' –

你在這裏。

def chunksiter(l, chunks): 
    i,j,n = 0,0,0 
    rl = [] 
    while n < len(l)/chunks:   
     rl.append(l[i:j+chunks])   
     i+=chunks 
     j+=j+chunks   
     n+=1 
    return iter(rl) 


def chunksiter2(l, chunks): 
    i,j,n = 0,0,0 
    while n < len(l)/chunks:   
     yield l[i:j+chunks] 
     i+=chunks 
     j+=j+chunks   
     n+=1

實例：

for l in chunksiter([1,2,3,4,5,6,7,8],3): 
    print(l) 

[1, 2, 3] 
[4, 5, 6] 
[7, 8] 

for l in chunksiter2([1,2,3,4,5,6,7,8],3): 
    print(l) 

[1, 2, 3] 
[4, 5, 6] 
[7, 8] 


for l in chunksiter2([1,2,3,4,5,6,7,8],5): 
    print(l) 

[1, 2, 3, 4, 5] 
[6, 7, 8]

來源

2012-01-24 19:10:58

這隻適用於序列，不適用於一般的迭代。 –

甲簡潔的實現是：

chunker = lambda iterable, n: (ifilterfalse(lambda x: x ==(), chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=())))

此操作，因爲[iter(iterable)]*n是含有相同迭代n次的列表;從這個列表中的每個迭代器中取出一個項目，與迭代器相同，結果是每個zip元素包含一組n項目。

izip_longest需要完全消耗基礎迭代，而不是在達到第一個耗盡迭代器時停止迭代，這會從iterable中剔除任何餘數。這導致需要過濾出填充值。因此，一個稍微更穩健的實現將是：

def chunker(iterable, n): 
    class Filler(object): pass 
    return (ifilterfalse(lambda x: x is Filler, chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=Filler)))

這保證了填充值是從來沒有在底層迭代的項目。使用上面的定義：

iterable = range(1,11) 

map(tuple,chunker(iterable, 3)) 
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10,)] 

map(tuple,chunker(iterable, 2)) 
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)] 

map(tuple,chunker(iterable, 4)) 
[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10)]

此實現幾乎你想要做什麼，但它有問題：

def chunks(it, step): 
    start = 0 
    while True: 
    end = start+step 
    yield islice(it, start, end) 
    start = end

（不同的是，因爲islice不拋出StopIteration異常或其他任何東西上超過it結束的調用將永遠產生;還有一個稍微棘手的問題，即在迭代該生成器之前必須消耗islice結果）。

在功能上產生的移動窗口：

izip(count(0, step), count(step, step))

所以這變爲：

(it[start:end] for (start,end) in izip(count(0, step), count(step, step)))

但是，仍然創造無限的迭代器。所以，你需要takewhile（或者別的東西可能會更好），以限制它：

chunk = lambda it, step: takewhile((lambda x: len(x) > 0), (it[start:end] for (start,end) in izip(count(0, step), count(step, step)))) 

g = chunk(range(1,11), 3) 

tuple(g) 
([1, 2, 3], [4, 5, 6], [7, 8, 9], [10])

來源

2012-01-24 19:31:46 Marcin

1.第一個代碼片段包含行「start = end」，它似乎沒有做任何事情，因爲循環的下一次迭代將以'start = 0'開始。此外，循環是無限的 - 它是「沒有任何」break「的」True「。 2.第二個代碼片段中的「len」是什麼？ 3.所有其他實現僅適用於序列，而不適用於一般迭代器。 4.檢查'x is（）'依賴於CPython的實現細節。作爲優化，空元組只創建一次，稍後再使用。雖然這不受語言規範的保證，所以你應該使用'x ==（）'。 –

5。'count（）'和'takewhile（）'的組合使用'range（）'更容易實現。 –

@SvenMarnach：我已經編輯了代碼和文本以迴應您的一些觀點。很需要打樣。 – Marcin

雖然OP要求函數返回塊的列表或元組，如果你需要返回迭代器，然後Sven Marnach's解決方案可以修改：

def grouper_it(n, iterable): 
    it = iter(iterable) 
    while True: 
     chunk_it = itertools.islice(it, n) 
     try: 
      first_el = next(chunk_it) 
     except StopIteration: 
      return 
     yield itertools.chain((first_el,), chunk_it)

一些性能測試：http://pastebin.com/YkKFvm8b

它只會如果稍微更有效的通過EL的函數循環在每個塊中都有。

來源

2012-01-25 04:59:30 reclosedev

+11

在找到文檔中的答案（這是上面被接受的，最高票數的答案）之後，我今天幾乎完全達到了這個設計*大量*低效率。當你一次對數十萬或數百萬個對象進行分組時 - 這就是你最需要分割的時候 - 它必須非常高效。這是正確的答案。 –

這是最好的解決方案。 –

我今天在做些事情，想出了我認爲是一個簡單的解決方案。它與jsbueno's的答案類似，但我相信當iterable的長度可被n整除時，他會產生空的group。當iterable用完時，我的答案會做一個簡單的檢查。

def chunk(iterable, chunk_size): 
    """Generate sequences of `chunk_size` elements from `iterable`.""" 
    iterable = iter(iterable) 
    while True: 
     chunk = [] 
     try: 
      for _ in range(chunk_size): 
       chunk.append(iterable.next()) 
      yield chunk 
     except StopIteration: 
      if chunk: 
       yield chunk 
      break

來源

2012-10-09 09:48:03 eidorb

這將適用於任何迭代。它返回發電機的發電機（充分的靈活性）。我現在意識到它基本上和@reclosedevs解決方案一樣，但沒有絨毛。隨着StopIteration傳播，無需try...except，這正是我們想要的。

iterable.next()當迭代爲空時需要調用StopIteration，因爲如果你願意，islice將繼續產生空的生成器。

更好，因爲它只有兩行，但很容易理解。

def grouper(iterable, n): 
    while True: 
     yield itertools.chain([iterable.next()], itertools.islice(iterable, n-1))

請注意，iterable.next()被放入一個列表中。如果iterable.next()可以迭代並且未放置在列表中，則itertools.chain會將該對象展平。感謝傑里米布朗指出這個問題。

來源

2015-04-08 20:36:55

雖然這可能會回答這個問題，包括解釋和描述的某些部分可能有助於理解你的方法，並啓發我們爲什麼你的答案脫穎而出 – deW1

不要只是將你的答案複製到另一個問題。如果你需要這樣做，那麼它表明一個是另一個的重複，他們和我投票結束。 –

這是重複的。之後看到這個線程。事實證明，我的答案有所不同。 –

在Python中通過塊（n）迭代迭代器？

回答

實例：

相關問題