這是落實以各種理由一件棘手的事情。可以毫不費力地實現它,對輸入數據沒有任何影響,映射和訪問模式的(時間)複雜性,但是首先有一個生成器的一個或多個優點消失了。在問題中給出的示例代碼中,不必跟蹤所有值的優點就會丟失。
如果我們允許純粹的隨機訪問模式,那麼至少所有的映射值都必須被緩存,這又會失去生成器的內存優勢。
在兩個假設
- 映射函數是昂貴的,並且僅稱爲稀疏
- 存取模式是隨機的,使得所有的值必須緩存
示例代碼中這個問題應該很好。這裏有一些稍微不同的代碼,它也可以處理髮電機作爲輸入數據。有利的是,不要詳盡地構建對象構造的傳入數據的副本。
因此,如果傳入的數據有__getitem__
方法(索引),那麼使用self.elements
字典實現精簡緩存封裝。如果訪問稀少,字典比列表更有效率。如果傳入數據沒有索引,我們只需要消耗和存儲未決映射的傳入數據。
代碼:
class LazilyMapped:
def __init__(self, fn, iterable):
"""LazilyMapped lazily evaluates a mapping/transformation function on incoming values
Assumes mapping is expensive, and [idx] access is random and possibly sparse.
Still, this may defeat having a memory efficient generator on the incoming data,
because we must remember all read data for potential future access.
Lots of different optimizations could be done if there's more information on the
access pattern. For example, memory could be saved if we knew that the access idx
is monotonic increasing (then a list storage would be more efficient, because
forgetting data is then trivial), or if we'd knew that any index is only accessed
once (we could get rid of the infite cache), or if the access is not sparse, but
random we should be using a list instead of a dict for self.elements.
fn is a filter function, getting one element of iterable and returning a bool,
iterable may be a generator
"""
self.fn = fn
self.sparse_in = hasattr(iterable, '__getitem__')
if self.sparse_in:
self.original_values = iterable
else:
self.iter = iter(iterable)
self.original_idx = 0 # keep track of which index to do next in incoming data
self.original_values = [] # keep track of all incoming values
self.elements = {} # forever remember mapped data
def proceed_to(self, idx):
"""Consume incoming values and store for later mapping"""
if idx >= self.original_idx:
for _ in range(self.original_idx, idx + 1):
self.original_values.append(next(self.iter))
self.original_idx = idx + 1
def __getitem__(self, idx):
if idx not in self.elements:
if not self.sparse_in:
self.proceed_to(idx)
self.elements[idx] = mapped = self.fn(self.original_values[idx])
else:
mapped = self.elements[idx]
return mapped
if __name__ == '__main__':
test_list = [1,2,3,4,5]
dut = LazilyMapped(lambda v: v**2, test_list)
assert dut[0] == 1
assert dut[2] == 9
assert dut[1] == 4
dut = LazilyMapped(lambda v: v**2, (num for num in range(1, 7)))
assert dut[0] == 1
assert dut[2] == 9
assert dut[1] == 4
來源
2014-04-04 20:26:56
cfi
你看着['itertools.islice'(https://docs.python.org/3.4/library/itertools.html#itertools.islice)? – jonrsharpe
@jonrsharpe只是做了,但是'list(islice(map(f,data),6,7))'在所需索引之前的所有元素上調用'f'。因此沒有收益。或者你會如何在這裏使用'islice'? – Hyperboreus
所以你想隨機訪問,但也懶惰訪問?我不知道有這樣的工具。請注意,您的示例僅適用於'lst'作爲列表,而不是任意的可迭代。您的需求組合似乎有點不尋常,所以您可能必須自己實現它。 – BrenBarn