批量更新太慢

我正在使用pymongo進行批量更新。
下面的名稱列表名（每名可能具有集合中多發的文件）的不同列表批量更新太慢

代碼1：

bulk = db.collection.initialize_unordered_bulk_op() 
for name in names: 
    bulk.find({"A":{"$exists":False},'Name':name}).update({"$set":{'B':b,'C':c,'D':d}}) 
print bulk.execute()

代碼2：

bulk = db.collection.initialize_unordered_bulk_op() 
counter = 0 
for name in names: 
    bulk.find({"A":{"$exists":False},'Name':name}).update({"$set":{'B':b,'C':c,'D':d}}) 
    counter =counter + 1 
    if (counter % 100 == 0): 
     print bulk.execute() 
     bulk = db.collection.initialize_unordered_bulk_op() 
if (counter % 100 != 0): 
    print bulk.execute()

我的收藏中有50000個文檔。如果我擺脫了計數器和if語句（代碼1），代碼就卡住了！使用if語句（代碼2），我假設這個操作不應該花費幾分鐘的時間，但是它比這個更多！你能幫我把它變快嗎？還是我錯了我的假設？！

來源

2016-09-23 amazingCodingExperience

你很可能忘了添加索引來支持你的查詢！這將觸發完整的收集掃描你的每一個操作是無聊的慢（如你所知道的）。

以下代碼使用update_many進行測試，並且在'名稱'和'A'字段中使用不帶索引和帶索引的批量填充。你得到的數字是爲自己說話的。

備註，我沒有足夠的熱情去做50000沒有索引，但10000文件。結果10000有：

沒有索引和update_many：38.6秒
沒有索引和批量更新：28.7秒
具有索引和update_many：3.9秒
具有索引和批量更新：0.52秒

對於添加了索引的50000個文檔，需要2.67秒。我確實在docker中的同一主機上運行的Windows機器和mongo上運行測試。

有關索引的更多信息，請參閱https://docs.mongodb.com/manual/indexes/#indexes。簡而言之：索引保存在RAM中，並允許快速查詢和查找文檔。索引必須專門選擇匹配您的查詢。

from pymongo import MongoClient 
import random 
from timeit import timeit 


col = MongoClient()['test']['test'] 

col.drop() # erase all documents in collection 'test' 
docs = [] 

# initialize 10000 documents use a random number between 0 and 1 converted 
# to a string as name. For the documents with a name > 0.5 add the key A 
for i in range(0, 10000): 
    number = random.random() 
    if number > 0.5: 
     doc = {'name': str(number), 
     'A': True} 
    else: 
     doc = {'name': str(number)} 
    docs.append(doc) 

col.insert_many(docs) # insert all documents into the collection 
names = col.distinct('name') # get all distinct values for the key name from the collection 


def update_with_update_many(): 
    for name in names: 
     col.update_many({'A': {'$exists': False}, 'Name': name}, 
         {'$set': {'B': 1, 'C': 2, 'D': 3}}) 

def update_with_bulk(): 
    bulk = col.initialize_unordered_bulk_op() 
    for name in names: 
     bulk.find({'A': {'$exists': False}, 'Name': name}).\ 
      update({'$set': {'B': 1, 'C': 2, 'D': 3}}) 
    bulk.execute() 

print(timeit(update_with_update_many, number=1)) 
print(timeit(update_with_bulk, number=1)) 
col.create_index('A') # this adds an index on key A 
col.create_index('Name') # this adds an index on key Name 
print(timeit(update_with_update_many, number=1)) 
print(timeit(update_with_bulk, number=1))

來源

2016-09-30 15:51:03 squanto773

感謝您的幫助，但我認爲您在上面給出的時間點不正確，因爲它們不是10000個文件，但只有一半（考慮> 0.5和<= 0.5是平等的可能）。此外，如果您可以分享如何爲初學者編制字段A和名稱的索引，這將有所幫助。再次感謝！ – amazingCodingExperience

另外，如何索引固定過程？你能分享一下這個理論嗎？ – amazingCodingExperience

添加更多信息給我的答案。但是，mongodb免費提供相當不錯的在線課程：https：//university.mongodb.com/courses/M101P/about我建議你選擇其中的一種來加速mongo。 – squanto773

批量更新太慢

回答

相關問題