2017-08-01 148 views
0

我使用Python 2.7,Peewee和MySQL。我的程序從csv文件讀取並更新字段,如果訂單號碼存在於csv中。可以有2000-3000次更新,我正在使用天真的方法逐一更新記錄,這種方法速度很慢。我已經從使用Peewee更新轉移到原始查詢,這有點快。但是,它仍然非常緩慢。我想知道如何在不使用循環的情況下以更少的事務更新記錄。Python Peewee MySQL批量更新

def mark_as_uploaded_to_zoho(self, which_file): 
    print "->Started marking the order as uploaded to zoho." 
    with open(which_file, 'rb') as file: 
     reader = csv.reader(file, encoding='utf-8') 
     next(reader, None) ## skipping the header 

     for r in reader: 
      order_no = r[0] 
      query = '''UPDATE sales SET UploadedToZoho=1 WHERE OrderNumber="%s" and UploadedToZoho=0''' %order_no 
      SalesOrderLine.raw(query).execute() 

    print "->Marked as uploaded to zoho." 

回答

0

您可以使用insert_many來限制交易次數並大幅提升速度。這需要一個迭代器,它返回模型字段與字典鍵匹配的字典對象。

根據您嘗試插入的記錄數量,您可以一次完成所有記錄,也可以將其分成更小的塊。在過去,我一次插入了超過10,000條記錄,但根據數據庫服務器和客戶端規格,這可能會非常緩慢,所以我將以兩種方式展示。

with open(which_file, 'rb') as file: 
    reader = csv.DictReader(file) 
    SalesOrderLine.insert_many(reader) 

OR

# Calls a function with chunks of an iterable as list. 
# Not memory efficient at all. 
def chunkify(func, iterable, chunk_size): 
    chunk = [] 
    for o in iterable: 
     chunk.append(o) 
     if len(chunk) > chunk_size: 
      func(chunk) 
      chunk = [] 

with open(which_file, 'rb') as file: 
    reader = csv.DictReader(file) 
    chunkify(SalesOrderLine.insert_many, reader, 1000) 

爲了更有效的方式來 「chunkify」 迭代器,結賬this question

通過簡單地使用with db.atomic可以獲得額外的加速,如概述here