如何在python中使用多線程時獲得更快的速度

現在我正在研究如何從網站獲取數據儘可能快。爲了獲得更快的速度，我正在考慮使用多線程。這裏是我用來測試多線程和簡單發佈之間區別的代碼。如何在python中使用多線程時獲得更快的速度

import threading 
import time 
import urllib 
import urllib2 


class Post: 

    def __init__(self, website, data, mode): 
     self.website = website 
     self.data = data 

     #mode is either "Simple"(Simple POST) or "Multiple"(Multi-thread POST) 
     self.mode = mode 

    def post(self): 

     #post data 
     req = urllib2.Request(self.website) 
     open_url = urllib2.urlopen(req, self.data) 

     if self.mode == "Multiple": 
      time.sleep(0.001) 

     #read HTMLData 
     HTMLData = open_url.read() 



     print "OK" 

if __name__ == "__main__": 

    current_post = Post("http://forum.xda-developers.com/login.php", "vb_login_username=test&vb_login_password&securitytoken=guest&do=login", \ 
         "Simple") 

    #save the time before post data 
    origin_time = time.time() 

    if(current_post.mode == "Multiple"): 

     #multithreading POST 

     for i in range(0, 10): 
      thread = threading.Thread(target = current_post.post) 
      thread.start() 
      thread.join() 

     #calculate the time interval 
     time_interval = time.time() - origin_time 

     print time_interval 

    if(current_post.mode == "Simple"): 

     #simple POST 

     for i in range(0, 10): 
      current_post.post() 

     #calculate the time interval 
     time_interval = time.time() - origin_time 

     print time_interval

就像你所看到的，這是一個非常簡單的代碼。首先我設置模式爲「簡單」，我可以得到時間間隔：50s（也許我的速度有點慢:(）。然後我設置模式爲「多」，我得到的時間間隔：我從中可以看出，多線程實際上可以提高速度，但結果並不如我想象的那麼好。我想獲得更快的速度。

從調試中，我發現程序主要是阻止在線：open_url = urllib2.urlopen(req, self.data)，這行代碼需要很多時間來發布和接收來自指定網站的數據。我想也許我可以通過添加time.sleep()並在urlopen函數中使用多線程來獲得更快的速度，但我不能這樣做，因爲它的python自己的功能。

如果不考慮服務器阻止發佈速度的可能限制，我還能做些什麼來獲得更快的速度？或我可以修改的任何其他代碼？多謝！

來源

2012-04-14 Searene

線程是在python一個壞主意，它就會很容易瓶頸並且可以通過GIL被困，嘗試多。 –

@JakobBowyer：線程是一個實現細節，真正的重點是打開多個連接。無論如何，Python中線程的GIL方面都沒有任何作用。 – orlp

@nightcracker，你應該在做出這樣的陳述之前閱讀GIL和線程......從這裏開始：[PyCon 2010：瞭解Python GIL]（http://python.mirocommunity.org/video/1479/pycon- 2010-understanding-the-p） –

在許多情況下，python的線程不提高執行速度非常好......有時，這使情況變得更糟。有關更多信息，請參見David Beazley's PyCon2010 presentation on the Global Interpreter Lock/Pycon2010 GIL slides。此演示文稿內容非常豐富，我強烈推薦給任何人考慮線程...

您應該使用multiprocessing module。我在代碼中包含了這個選項（請參閱我的答案的底部）。

在我的老機器的一個運行這個（Python的2.6.6）：

current_post.mode == "Process" (multiprocessing) --> 0.2609 seconds 
current_post.mode == "Multiple" (threading)  --> 0.3947 seconds 
current_post.mode == "Simple" (serial execution) --> 1.650 seconds

我同意TokenMacGuy的評論和上述數字包括移動.join()到不同的循環。正如你所看到的，python的多處理比線程要快得多。

from multiprocessing import Process 
import threading 
import time 
import urllib 
import urllib2 


class Post: 

    def __init__(self, website, data, mode): 
     self.website = website 
     self.data = data 

     #mode is either "Simple"(Simple POST) or "Multiple"(Multi-thread POST) 
     self.mode = mode 

    def post(self): 

     #post data 
     req = urllib2.Request(self.website) 
     open_url = urllib2.urlopen(req, self.data) 

     if self.mode == "Multiple": 
      time.sleep(0.001) 

     #read HTMLData 
     HTMLData = open_url.read() 

     print "OK" 

if __name__ == "__main__": 

    current_post = Post("http://forum.xda-developers.com/login.php", "vb_login_username=test&vb_login_password&securitytoken=guest&do=login", \ 
         "Process") 
    #save the time before post data 
    origin_time = time.time() 

    if(current_post.mode == "Multiple"): 

     #multithreading POST 
     threads = list() 
     for i in range(0, 10): 
      thread = threading.Thread(target = current_post.post) 
      thread.start() 
      threads.append(thread) 
     for thread in threads: 
      thread.join() 
     #calculate the time interval 
     time_interval = time.time() - origin_time 
     print time_interval 

    if(current_post.mode == "Process"): 

     #multiprocessing POST 
     processes = list() 
     for i in range(0, 10): 
      process = Process(target=current_post.post) 
      process.start() 
      processes.append(process) 
     for process in processes: 
      process.join() 
     #calculate the time interval 
     time_interval = time.time() - origin_time 
     print time_interval 

    if(current_post.mode == "Simple"): 

     #simple POST 
     for i in range(0, 10): 
      current_post.post() 
     #calculate the time interval 
     time_interval = time.time() - origin_time 
     print time_interval

來源

2012-04-16 15:56:11

thx很多。多處理是一個好主意，它確實比我的電腦上的多線程更快一點。 thx你們所有人。我從這個問題中學到了很多東西。 – Searene

@MarkZar，我會說速度提高了33％，速度稍快一點，但是不管我希望你的項目如何。 –

DNS查找需要時間。你無能爲力。大延遲是首先使用多線程的一個原因 - 多個查找廣告網站GET/POST可以並行發生。

轉儲睡眠（） - 這沒有幫助。

來源

2012-04-14 15:01:36

Thx，但我只是困惑爲什麼'time.sleep（）'沒用。實際上，它在傾倒'sleep（）'後也可以很好地工作，但是如何在沒有'sleep（）'的情況下實現多線程呢？ python會自動運行不同的線程嗎？如果是這樣，使用'sleep（）'函數是什麼？ – Searene

這不是無用的，只是在這裏不合適。使用睡眠 - 有負載。 '打開泵後，等待至少10秒鐘，使壓力穩定後再打開進料閥'。 –

請記住，多線程可以「增加速度」的Python的唯一情況是當你有像這樣的一個操作這是很大的I/O限制。否則多線程不會增加「速度」，因爲它不能在多個CPU上運行（不，即使你有多個內核，python也不會以這種方式工作）。當你想要同時完成兩件事情時，而不是當你需要兩件事並行時（即兩個過程單獨運行），你應該使用多線程。現在

，你實際上在做實際上並不會增加任何單一的DNS查詢的速度，但它允許多個請求在等待某些人的結果被槍斃掉，但你要小心的你做了多少次，或者你會讓響應時間比現在更糟。

也請停止使用的urllib2和使用要求：http://docs.python-requests.org

來源

2012-04-14 15:15:25 Wes

你做錯了最重要的事情，那就是傷害你的吞吐量之最，是您呼叫thread.start()和thread.join()方式：

for i in range(0, 10): 
    thread = threading.Thread(target = current_post.post) 
    thread.start() 
    thread.join()

每次通過循環，您創建一個線程，啓動它，然後等待它完成在轉到下一個線程之前。你根本沒有做任何事情！

什麼你應該做的卻是：

threads = [] 

# start all of the threads 
for i in range(0, 10): 
    thread = threading.Thread(target = current_post.post) 
    thread.start() 
    threads.append(thread) 

# now wait for them all to finish 
for thread in threads: 
    thread.join()

來源

2012-04-14 15:44:44 SingleNegationElimination

我甚至沒有看那麼遠。加入後再次啓動:( –

這是一個漸進式的改進，但不管是什麼蟒蛇現有的線程是可怕的，我們應該建議多;見我的回答 –

@Mike：這不是一個漸進式的改進，在所有;使用代碼MarkZar。提供了，它將運行時間從20次左右改進到不到半秒，這很有意義，因爲http使用最少的CPU，但對網絡延遲非常敏感，所以使用'threading'而不是'multiprocessing'是一個完全合理的解決方案，如果使用Keep-Alive http客戶端（在我的固定線程測試中'urlib3'比'urllib2快大約30％，否則無法提高），這將在整個過程中不可用。 – SingleNegationElimination

如何在python中使用多線程時獲得更快的速度

回答

相關問題