並行for循環，Python

目前，此嵌套for循環需要將近一個小時才能完成。我希望重寫它並創建一些並行同步。我還沒有找到答案，如何做如下嵌套的東西。任何正確的方向指針將不勝感激。並行for循環，Python

#used to update the Software Name's from softwareCollection using the regexCollection 
    startTime = time.time() 
    for x in softwareCollection.find({}, {"Software Name":-1,"Computer Name":-1,"Version":-1,"Publisher":-1,"reged": null }, no_cursor_timeout=True): 
     for y in regexCollection.find({}, {"regName": 1,"newName":1}, no_cursor_timeout=True): 
      try: 
       regExp = re.compile(y["regName"]) 
      except: 
       print(y["regName"]) 
       break 
      oldName = x["Software Name"] 
      newName = y["newName"] 
      if(regExp.search(oldName)): 
       x["Software Name"] = newName 
       x["reged"] = "true" 
       softwareCollection.save(x) 
       break 
      else: 
       continue 
    print(startTime - time.time()/60) 
    cursor.close()

來源

2017-05-02 Loglem

你可以進一步解釋這是什麼嗎？ – patrick

那麼現在要做的是從mongoDB列中取出軟件名稱，並將其與我保存在單獨的mongo集合中的正則表達式查詢列表進行比較。如果該名稱與正則表達式匹配，則將該字段重命名爲與該正則表達式關聯的任何名稱。 – Loglem

根據迭代過x數量，你可以生成一個線程每個x一步，這將遍歷y。

首先，定義運行功能依賴於x：

def y_iteration(x): 
    for y in ... : 
     ...

然後產卵運行在每次迭代這個功能在x線程：

for x in ... : 
    _thread.start_new_thread(y_iteration, (x,))

這是一個非常簡單的例子，使用低級_thread模塊。

現在您可能需要加入主線程，在這種情況下，您將需要使用threading模塊。你可能會把你x迭代在一個線程中參加吧：

def x_iteration(): 
    for x in ... : 
     threading.Thread(target=y_iteration, args=(x,)).start() 

thread = threading.Thread(target=x_iteration) 
thread.start() 
thread.join()

話又說回來，這取決於在x迭代你打算做的數量（看看How many threads it too many?）。如果這個數字應該很好，你可能想創建一個例如一百名工作人員的池，並用y_iteration來提供它們。當每個工作人員都在工作時，請等到一個人免費。

來源

2017-05-02 21:35:54

總共有350萬條記錄，所以我認爲合併肯定是要走的路？ – Loglem

@Loglem在'x'上重複了350萬次？是的，我就是這樣處理的。 –

是x爲350萬，y爲450。你有沒有看到有人使用池導入來做類似的事情？ – Loglem

所以我能夠讓它運行起來，工作速度是順序版本的兩倍。我擔心的是，完成這個過程還需要4個小時。有沒有辦法讓這個效率更高，或者我希望這會花費很長時間。

#used to update the Software Name's from softwareCollection using the regexCollection 
def foo(x): 
    for y in regexCollection.find({}, {"regName": 1,"newName":1}, no_cursor_timeout=True): 
     try: 
      regExp = re.compile(y["regName"]) 
     except: 
      print(y["regName"]) 
      break 
     oldName = x["SoftwareName"] 
     newName = y["newName"] 
     if(regExp.search(oldName)): 
      x["SoftwareName"] = newName 
      x["field4"] = "reged" 
      softwareCollection.save(x) 
      break 
     else: 
      continue 


if __name__ == '__main__': 
    startTime = time.time() 
    Parallel(n_jobs=4)(delayed(foo)(x) for x in softwareCollection.find()) 

    print(time.time() - startTime/60) 
    cursor.close()

來源

2017-05-03 22:16:26 Loglem

並行for循環，Python

回答

相關問題