產量也沒關係使用yield語句的類的實例方法?例如,我可以從一個實例方法
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
self.i = 0
self.nout = 0
def itervalues(self, x):
for xi in x:
self.i += 1
if self.i == self.n:
self.i = 0
self.nout += 1
yield self.nout, xi
Python不會抱怨,簡單的情況下似乎工作。但是,我只看到了使用常規函數的例子。
我開始有問題,當我嘗試使用itertools函數使用它。例如,假設我有兩個存儲在多個文件中的大數據流X和Y,並且我想通過數據只計算一個循環的總和和差。我可以用itertools.tee
和itertools.izip
像下圖中
在代碼中它會是這樣的(不好意思,這是長)
from itertools import izip_longest, izip, tee
import random
def add(x,y):
for xi,yi in izip(x,y):
yield xi + yi
def sub(x,y):
for xi,yi in izip(x,y):
yield xi - yi
class NthSumDiff(object):
def __init__(self, n):
self.nthsum = Nth(n)
self.nthdiff = Nth(n)
def itervalues(self, x, y):
xadd, xsub = tee(x)
yadd, ysub = tee(y)
gen_sum = self.nthsum.itervalues(add(xadd, yadd))
gen_diff = self.nthdiff.itervalues(sub(xsub, ysub))
# Have to use izip_longest here, but why?
#for (i,nthsum), (j,nthdiff) in izip_longest(gen_sum, gen_diff):
for (i,nthsum), (j,nthdiff) in izip(gen_sum, gen_diff):
assert i==j, "sum row %d != diff row %d" % (i,j)
yield nthsum, nthdiff
nskip = 12
ns = Nth(nskip)
nd = Nth(nskip)
nsd = NthSumDiff(nskip)
nfiles = 10
for i in range(nfiles):
# Generate some data.
# If the block length is a multiple of nskip there's no problem.
#n = random.randint(5000, 10000) * nskip
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
x = range(n)
y = range(100,n+100)
# Independent processing is no problem but requires two loops.
for i, nthsum in ns.itervalues(add(x,y)):
pass
for j, nthdiff in nd.itervalues(sub(x,y)):
pass
assert i==j
# Trying to do both with one loops causes problems.
for nthsum, nthdiff in nsd.itervalues(x,y):
# If izip_longest is necessary, why don't I ever get a fillvalue?
assert nthsum is not None
assert nthdiff is not None
# After each block of data the two iterators should have the same state.
assert nsd.nthsum.nout == nsd.nthdiff.nout, \
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
但這種失敗,除非我換itertools.izip
出來即使迭代器具有相同的長度,也可以使用itertools.izip_longest
。這是最後assert
那被擊中,具有輸出像
file 0 n=58581
file 1 n=87978
Traceback (most recent call last):
File "test.py", line 71, in <module>
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
AssertionError: sum nout 12213 != diff nout 12212
編輯:我想這是不是從我寫的例子明顯的,但輸入數據X和Y僅在塊可用的(在我的真正的問題他們在文件中分塊)。這很重要,因爲我需要維護塊之間的狀態。在上面的玩具例如,這意味着Nth
需要產生的
>>> x1 = range(0,10)
>>> x2 = range(10,20)
>>> (x1 + x2)[::3]
[0, 3, 6, 9, 12, 15, 18]
不是
>>> x1[::3] + x2[::3]
[0, 3, 6, 9, 10, 13, 16, 19]
相當於我可以用itertools.chain
提前加入的時間塊,然後將相當於打一個電話,給Nth.itervalues
,但我想了解什麼是錯的,在調用之間的Nth
類保持狀態(我真正的應用程序是一個涉及多個保存的狀態,而不是簡單的第N /加/減圖像處理)。
我不明白我的Nth
情況下如何結束在不同的狀態時,它們的長度是相同的。例如,如果我給相等長度
>>> [''.join(x) for x in izip('ABCD','abcd')]
['Aa', 'Bb', 'Cc', 'Dd']
我得到同樣長度的結果的izip
兩個字符串;爲什麼我的Nth.itervalues
發電機似乎得到數量不等的next()
調用,即使每一個產生相同數量的結果?
要回答標題問題:是的,從實例方法產生'yield'ing很好。它實際上是實現'__iter__'自定義'Iterable'類型的最簡單的Pythonic方式。 – ShadowRanger
難道你不能用'def Nth(x,n):return enumerate(x [:: n])'替換'class Nth'嗎?哦,還是你需要將'x'切片成爲一個迭代器,出於性能原因? – Harvey
'def Nth(x,n):return enumerate(xi for i,xi in enumerate(x)if i%n == 0)' – Harvey