2017-02-15 60 views
0

我正在處理一個巨大的postgresql數據庫,爲此我創建了一個「fetch」函數。檢測是否在for-loop項目是產生項目時的最後一個項目?

def fetch(cursor, batch_size=1e3): 
    """An iterator that uses fetchmany to keep memory usage down""" 
    while True: 
     records = cursor.fetchmany(int(batch_size)) 
     if not records: 
      break 
     for record in records: 
      yield record 

對於每一個項目,我做了一些處理,但現在我有地方在某些情況下,最後一個項目將是我做的項目之間的一些比較被忽略的一個問題。只要這個比較沒有在最後一個項目上產生,就不會做任何事情。

connection = psycopg2.connect(<url>) 
cursor = connection.cursor() 

cursor.execute(<some query>) 

temp_today = 0 

for row in fetch(cursor): 
    item = extract_variables(row) 
    date = item['datetime'] 
    today = date.date() 
    if temp_today is 0: 
     # do something with first row 
     temp_today = date 
    # ----------------------------------------- 
    # I feel like I am missing a statement here 
    # something like: 
    # if row == rows[-1]: 
    #  do something with last row.. 
    # ----------------------------------------- 
    elif temp_today.date() == today: 
     # do something with every row where 
     # the date is the same 
    else: 
     # do something with every row where 
     # the dates ain't the same 

當我使用yield時,如何處理最後一個項目?

對於我來說使用yield是非常重要的,因爲我正在處理一個非常龐大的數據集,並且如果我不處理這些數據集,我將耗盡內存。

+1

應該可以從光標獲得結果集中的行數,對吧?然後,您可以將計數器(枚舉)與該數字進行比較。 –

+1

'...因爲我正在做一些項目比較'你可以在數據庫中做到這一點(通過使用窗口函數,或通過一些自我加入) – wildplasser

回答

0

感謝@Peter斯密特:

connection = psycopg2.connect(<url>) 
cursor = connection.cursor() 

cursor.execute(<some query>) 

temp_today = 0 
parsed_count = 0 
cursor_count = cursor.rowcount 

for row in fetch(cursor): 
    item = extract_variables(row) 
    date = item['datetime'] 
    today = date.date() 
    if temp_today is 0: 
     # do something with first row 
     temp_today = date 
    elif parsed_count == cursor_count: 
     # do something with the last row 
    elif temp_today.date() == today: 
     # do something with every row where 
     # the date is the same 
    else: 
     # do something with every row where 
     # the dates ain't the same 
0

可以定義另一個生成,因此您可以遍歷項目,並返回前一個(如果有的話):從意見我用以下解決方案

def pair(sequence): 
    previous = None 
    for item in sequence: 
     yield (item, previous) 
     previous = item 

for item, previous_item in pair(mygenerator(args)) 
    if previous_item is None: 
     # process item: first one returned 
    else: 
     # you can compare item and previous_item 
相關問題