Python - 使用線程或隊列遍歷調用函數的for循環

我對python相當陌生，正在製作一個腳本，允許將其他程序的點雲數據導入Autodesk Maya。我的腳本運行良好，但我想要做的是讓它更快。我有一個循環遍歷編號文件的列表。即datafile001.txt，datafile002.txt等。我想知道的是，是否有辦法讓它一次執行多個，可能使用線程或隊列？下面我的代碼我一直在努力：Python - 使用線程或隊列遍歷調用函數的for循環

 def threadedFuntion(args): 
     if len(sourceFiles) > 3: 
      for count, item in enumerate(sourceFiles): 
        t1=Thread(target=convertPcToPdc,args=(sourceFiles[filenumber1], particlesName, startframe, endframe, pdcIncrements, outputDirectory, variableFolder, acceptableArrayforms, dataType)) 
        t1.start() 
        t2=Thread(target=convertPcToPdc,args=(sourceFiles[filenumber2], particlesName, startframe, endframe, pdcIncrements, outputDirectory, variableFolder, acceptableArrayforms, dataType)) 
        t2.start() 
        t3=Thread(target=convertPcToPdc,args=(sourceFiles[filenumber3], particlesName, startframe, endframe, pdcIncrements, outputDirectory, variableFolder, acceptableArrayforms, dataType)) 
        t3.start() 
        t4=Thread(target=convertPcToPdc,args=(sourceFiles[filenumber4], particlesName, startframe, endframe, pdcIncrements, outputDirectory, variableFolder, acceptableArrayforms, dataType)) 
        t4.start()

這顯然是有很多原因不能正常工作，首先它只是將創建4個線程，我希望能夠給一個選項，更或更少。其次它錯誤，因爲它試圖重用一個線程？就像我說的，我對python很陌生，而且頭腦微微一點，我在這裏看過幾篇文章，但無法讓其工作得很好。我認爲隊列可能是我需要的東西，但無法弄清楚，我嘗試了條件語句和加入語句，但再次無法得到我想要的。

我想更具體什麼我想實現的是，該函數是通過文本文件讀取，檢索coords，然後將它們導出爲二進制文件供maya讀取。這些文本文件中的一個具有5-10百萬個x，y，z座標，這很常見，這需要相當長的時間。大約需要30分鐘 - 1小時的時間在一臺漂亮的獸人電腦上做1個文件，任務管理器說python只使用12％的處理器和大約1％的RAM，所以如果我可以同時做多個這樣的文件，更多文件的速度要快很多。我不認爲多線程/排隊for循環很難，但我已經迷路了，並且嘗試了一週左右的失敗解決方案。

謝謝大家的幫助，我真的很感激，並認爲這個網站是驚人的。這是我的第一篇文章，但我覺得我只是從閱讀這裏完全學會了Python。

來源

2012-10-13 Burninghelix123

如果您的任務是IO綁定的（程序花費大部分時間等待磁盤中的數據），那麼添加競爭光盤訪問的線程將無助於提高性能。如果任務是CPU限制的（程序處理數據的速度比磁盤能夠提供的速度慢），那麼在您的情況下，如果您有多個CPU，則可以使用多處理模塊來處理不同進程中的文件。@ CaptainMurthy的例子應該像現在一樣工作，如果你刪除'線程'名稱並使用'從多處理導入進程，鎖'代替 – jfs

子類threading.Thread作爲run（）的一部分放在那個類中。

import threading 
import time 
import random 

class Worker(threading.Thread): 
    def __init__(self, srcfile, printlock,**kwargs): 
     super(Worker,self).__init__(**kwargs) 
     self.srcfile = srcfile 
     self.lock = printlock # so threads don't step on each other's prints 

    def run(self): 
     with self.lock: 
      print("starting %s on %s" % (self.ident,self.srcfile)) 
     # do whatever you need to, return when done 
     # example, sleep for a random interval up to 10 seconds 
     time.sleep(random.random()*10) 
     with self.lock: 
      print("%s done" % self.ident) 


def threadme(srcfiles): 
    printlock = threading.Lock() 
    threadpool = [] 
    for file in srcfiles: 
     threadpool.append(Worker(file,printlock)) 

    for thr in threadpool: 
     thr.start() 

    # this loop will block until all threads are done 
    # (however it won't necessarily first join those that are done first) 
    for thr in threadpool: 
     thr.join() 

    print("all threads are done") 

if __name__ == "__main__": 
    threadme(["abc","def","ghi"])

按照要求，以限制線程數，使用以下命令：

def threadme(infiles,threadlimit=None,timeout=0.01): 
    assert threadlimit is None or threadlimit > 0, \ 
      "need at least one thread"; 
    printlock = threading.Lock() 
    srcfiles = list(infiles) 
    threadpool = [] 

    # keep going while work to do or being done 
    while srcfiles or threadpool: 

     # while there's room, remove source files 
     # and add to the pool 
     while srcfiles and \ 
      (threadlimit is None \ 
      or len(threadpool) < threadlimit): 
      file = srcfiles.pop() 
      wrkr = Worker(file,printlock) 
      wrkr.start() 
      threadpool.append(wrkr) 

     # remove completed threads from the pool 
     for thr in threadpool: 
      thr.join(timeout=timeout) 
      if not thr.is_alive(): 
       threadpool.remove(thr) 

    print("all threads are done") 

if __name__ == "__main__": 
    for lim in (1,2,3,4): 
     print("--- Running with thread limit %i ---" % lim) 
     threadme(("abc","def","ghi"),threadlimit=lim)

注意，這實際上在相反的過程中源（由於列表彈出（））。如果您需要按順序完成它們，請在某處反轉列表，或者使用deque和popleft（）。

來源

2012-10-13 01:25:39 engineerC

線程可能應該是守護進程和連接（）有一個超時，以便能夠輕鬆殺死處理（is_alive（）在這種情況下可能是必要的） – jfs

@CaptainMurphy我正在試驗上面的代碼，如果我正確思考這將爲每個srcfile創建一個新的線程？還是隻創建一個線程並一次遍歷所有src文件？如果第一個是正確的，並且選擇了許多src文件，那麼它會使計算機崩潰，如果第二個文件是正確的，那麼它會比for循環更快嗎？我怎麼能指定使用例如5線程？ – Burninghelix123

@ J.F.Sebastian is_alive（）是我第一次嘗試使用線程時嘗試做的事情，但我無法讓它工作 – Burninghelix123

我會建議使用mrjob爲此。

喬先生是一個python執行map reduce。

下面是先生的工作代碼做一個多線程的字數超過了很多的文本文件：

from mrjob.job import MRJob 

class MRWordCounter(MRJob): 
    def get_words(self, key, line): 
     for word in line.split(): 
      yield word, 1 

    def sum_words(self, word, occurrences): 
     yield word, sum(occurrences) 

    def steps(self): 
     return [self.mr(self.get_words, self.sum_words),] 

if __name__ == '__main__': 
    MRWordCounter.run()

此代碼的所有文件映射並行（計算每個文件的話），然後降低各種計數成一個單一的總字數。

來源

2012-10-13 00:56:39 Asimov4

嗯有趣的我將不得不做一些挖掘它，謝謝你的快速反應，我想留下來遠離任何外部/附加軟件包（不太清楚技術名稱），但如果我需要，我會。只是爲了澄清，我已經有了讀取文件的功能，只需要同時多次運行該功能 – Burninghelix123

在這種特殊情況下，這個外部庫很容易設置，它可以幫助您不重新發明多線程計算。您可以專注於設計如何處理數據，而不是如何在正確的位置獲取數據。 Map Reduce是業界廣泛使用的概念，您總能找到可以應用的其他案例。 – Asimov4

Python - 使用線程或隊列遍歷調用函數的for循環

回答

相關問題