2014-04-03 102 views
1

在python中,我知道兩個懶惰的「容器」:生成器和<class 'map'>可下載的懶惰映射

兩者都不是可以下標的。所以map(f, data)[1](f(x) for x in data)[1]將會失敗。

在python中是否有一個支持下標的惰性映射類?

如果沒有什麼是最接近的匹配?

我一直在尋找functools無濟於事(或我沒有發現它)。

基本上我尋找的是這樣的(但重新發明輪子應該是最後的選擇):

class lazilymappedlist: 
    def __init__ (self, f, lst): 
     self.f = f 
     self.data = [ (False, e) for e in lst] 

    def __getitem__ (self, idx): 
     status, elem = self.data [idx] 
     if status: return elem 
     elem = self.f (elem) 
     self.data [idx] = (True, elem) 
     return elem 
+0

你看着['itertools.islice'(https://docs.python.org/3.4/library/itertools.html#itertools.islice)? – jonrsharpe

+0

@jonrsharpe只是做了,但是'list(islice(map(f,data),6,7))'在所需索引之前的所有元素上調用'f'。因此沒有收益。或者你會如何在這裏使用'islice'? – Hyperboreus

+0

所以你想隨機訪問,但也懶惰訪問?我不知道有這樣的工具。請注意,您的示例僅適用於'lst'作爲列表,而不是任意的可迭代。您的需求組合似乎有點不尋常,所以您可能必須自己實現它。 – BrenBarn

回答

1

這是落實以各種理由一件棘手的事情。可以毫不費力地實現它,對輸入數據沒有任何影響,映射和訪問模式的(時間)複雜性,但是首先有一個生成器的一個或多個優點消失了。在問題中給出的示例代碼中,不必跟蹤所有值的優點就會丟失。

如果我們允許純粹的隨機訪問模式,那麼至少所有的映射值都必須被緩存,這又會失去生成器的內存優勢。

在兩個假設

  • 映射函數是昂貴的,並且僅稱爲稀疏
  • 存取模式是隨機的,使得所有的值必須緩存

示例代碼中這個問題應該很好。這裏有一些稍微不同的代碼,它也可以處理髮電機作爲輸入數據。有利的是,不要詳盡地構建對象構造的傳入數據的副本。

因此,如果傳入的數據有__getitem__方法(索引),那麼使用self.elements字典實現精簡緩存封裝。如果訪問稀少,字典比列表更有效率。如果傳入數據沒有索引,我們只需要消耗和存儲未決映射的傳入數據。

代碼:

class LazilyMapped: 
    def __init__(self, fn, iterable): 
     """LazilyMapped lazily evaluates a mapping/transformation function on incoming values 

     Assumes mapping is expensive, and [idx] access is random and possibly sparse. 
     Still, this may defeat having a memory efficient generator on the incoming data, 
     because we must remember all read data for potential future access. 

     Lots of different optimizations could be done if there's more information on the 
     access pattern. For example, memory could be saved if we knew that the access idx 
     is monotonic increasing (then a list storage would be more efficient, because 
     forgetting data is then trivial), or if we'd knew that any index is only accessed 
     once (we could get rid of the infite cache), or if the access is not sparse, but 
     random we should be using a list instead of a dict for self.elements. 

     fn is a filter function, getting one element of iterable and returning a bool, 
     iterable may be a generator 
     """ 
     self.fn = fn 
     self.sparse_in = hasattr(iterable, '__getitem__') 
     if self.sparse_in: 
      self.original_values = iterable 
     else: 
      self.iter = iter(iterable) 
      self.original_idx = 0  # keep track of which index to do next in incoming data 
      self.original_values = [] # keep track of all incoming values 
     self.elements = {}  # forever remember mapped data 
    def proceed_to(self, idx): 
     """Consume incoming values and store for later mapping""" 
     if idx >= self.original_idx: 
      for _ in range(self.original_idx, idx + 1): 
       self.original_values.append(next(self.iter)) 
      self.original_idx = idx + 1 
    def __getitem__(self, idx): 
     if idx not in self.elements: 
      if not self.sparse_in: 
       self.proceed_to(idx) 
      self.elements[idx] = mapped = self.fn(self.original_values[idx]) 
     else: 
      mapped = self.elements[idx] 
     return mapped 

if __name__ == '__main__': 
    test_list = [1,2,3,4,5] 
    dut = LazilyMapped(lambda v: v**2, test_list) 
    assert dut[0] == 1 
    assert dut[2] == 9 
    assert dut[1] == 4 

    dut = LazilyMapped(lambda v: v**2, (num for num in range(1, 7))) 
    assert dut[0] == 1 
    assert dut[2] == 9 
    assert dut[1] == 4 
+0

謝謝你的非常詳細的答案。給我時間去消化它。 – Hyperboreus