Queue.join（）不會取消阻止

我正在嘗試編寫一個用於並行抓取網站的Python腳本。我製作了一個原型，可以讓我爬到深處。Queue.join（）不會取消阻止

但是，join()似乎沒有工作，我不明白爲什麼。

這裏是我的代碼：

from threading import Thread 
import Queue 
import urllib2 
import re 
from BeautifulSoup import * 
from urlparse import urljoin 


def doWork(): 
    while True: 
     try: 
      myUrl = q_start.get(False) 
     except: 
      continue 
     try: 
      c=urllib2.urlopen(myUrl) 
     except: 
      continue 
     soup = BeautifulSoup(c.read()) 
     links = soup('a') 
     for link in links: 
      if('href' in dict(link.attrs)): 
       url = urljoin(myUrl,link['href']) 
       if url.find("'")!=-1: continue 
       url=url.split('#')[0] 
       if url[0:4] == 'http': 
        print url 
        q_new.put(url) 




q_start = Queue.Queue() 

q_new = Queue.Queue() 



for i in range(20): 
     t = Thread(target=doWork) 
     t.daemon = True 
     t.start() 


q_start.put("http://google.com") 
print "loading" 
q_start.join() 
print "end"

來源

2015-06-12 user2980055

join() will block until task_done() has been called as many times as items have been enqueued。

你不叫task_done()，因此join()塊。在你提供的代碼中，正確的地方是你的doWork循環的末尾：

def doWork(): 
    while True: 
    task = start_q.get(False) 
    ... 
    for subtask in processed(task): 
     ... 
    start_q.task_done() # tell the producer we completed a task

來源

2015-06-12 14:50:45 pilcrow

是的，這是不可能的。謝謝！ – user2980055

Queue.join（）不會取消阻止

回答

相關問題