2010-09-29 60 views
0

我正在創建一個尋呼機,該尋呼機從python-couchdb的Apache CouchDB映射函數返回文檔。這個生成器表達式運行良好,直到達到最大遞歸深度。如何才能改進以迭代,而不是遞歸?python-couchdb尋呼機觸發遞歸深度限制

def page(db, view_name, limit, include_docs=True, **opts): 
    """ 
    `page` goes returns all documents of CouchDB map functions. It accepts 
    all options that `couchdb.Database.view` does, however `include_docs` 
    should be omitted, because this will interfere with things. 

    >>> import couchdb 
    >>> db = couchdb.Server()['database'] 
    >>> for doc in page(db, '_all_docs', 100): 
    >>> doc 
    #etc etc 
    >>> del db['database'] 

    Notes on implementation: 
     - `last_doc` is assigned on every loop, because there doesn't seem to 
     be an easy way to know if something is the last item in the iteration. 
    """ 

    last_doc = None 
    for row in db.view(view_name, 
        limit=limit+1, 
        include_docs=include_docs, 
        **opts): 
     last_doc = row.key, row.id 
     yield row.doc 
    if last_doc: 
     for doc in page(db, view_name, limit, 
       inc_docs=inc_docs, 
       startkey=last_doc[0], 
       startkey_docid=last_doc[1]): 
      yield doc 
+0

我看不懂這段代碼。我不是PEP8鸚鵡的粉絲,但請至少使用* 4空格縮進。 – 2010-09-29 23:55:02

+0

這並沒有真正回答這個問題,但有用的說明是,您可以通過使用'sys.setrecursionlimit()' – 2010-09-30 00:15:48

+0

來更改最大遞歸深度。感謝@Rafe,我知道,但是因爲我返回了幾十萬行,我不想殺死電腦。 – 2010-09-30 00:26:04

回答

0

這裏有一些讓你開始。你沒有指定什麼*opts可能是;如果你只需要startkey和startkey_docid來啓動遞歸,而不需要其他字段,那麼你可以去除額外的功能。

很明顯,未經測試。

def page_key(db, view_name, limit, startkey, startkey_docid, inc_docs=True): 
    queue = [(startkey, startkey_docid)] 
    while queue: 
     key = queue.pop() 

     last_doc = None 
     for row in db.view(view_name, 
          limit=limit+1, 
          include_docs=inc_docs, 
          startkey=key[0], 
          startkey_docid=key[1]): 
      last_doc = row.key, row.id 
      yield row.doc 

     if last_doc: 
      queue.append(last_doc) 

def page(db, view_name, limit, inc_docs=True, **opts): 
    last_doc = None 
    for row in db.view(view_name, 
         limit=limit+1, 
         include_docs=inc_docs, 
         **opts): 
     last_doc = row.key, row.id 
     yield row.doc 

    if last_doc: 
     for doc in page_key(db, view_name, limit, last_doc[0], last_doc[1], inc_docs): 
      yield doc 
0

這與> 800K文檔的數據庫上,我已經測試(手動地)的替代方法。似乎工作。

def page2(db, view_name, limit, inc_docs=True, **opts): 
    def get_batch(db=db, view_name=view_name, limit=limit, inc_docs=inc_docs, **opts): 
     for row in db.view(view_name, limit=limit+1, include_docs=inc_docs, **opts): 
      yield row 
    last_doc = None 
    total_rows = db.view(view_name, limit=1).total_rows 
    batches = (total_rows/limit) + 1 
    for i in xrange(batches): 
     if not last_doc: 
      for row in get_batch(): 
       last_doc = row.key, row.id 
       yield row.doc or row # if include_docs is False, 
             # row.doc will be None 
     else: 
      for row in get_batch(startkey=last_doc[0], 
          startkey_docid=last_doc[1]): 
       last_doc = row.key, row.id 
       yield row.doc or row 
0

我不使用CouchDB,所以我在理解示例代碼時遇到了一些問題。這裏是一個精簡版,我相信工作你想要的方式:

all_docs = range(0, 100) 

def view(limit, offset): 
    print "view: returning", limit, "rows starting at", offset 
    return all_docs[offset:offset+limit] 

def generate_by_pages(page_size): 
    offset = 0 
    while True: 
     rowcount = 0 
     for row in generate_page(page_size, offset): 
      rowcount += 1 
      yield row 
     if rowcount == 0: 
      break 
     else: 
      offset += rowcount 

def generate_page(page_size, offset): 
    for row in view(page_size, offset): 
     yield row 

for r in generate_by_pages(10): 
    print r 

的關鍵是用迭代替換遞歸。有很多方法可以做到這一點(我喜歡Python中的蹦牀),但上述內容很簡單。