2012-01-05 43 views
0

我閱讀了關於IBM developer sources中的線程並找到以下示例。瞭解Python中的線程:如何告訴`run()`返回已處理的數據?

總的來說,我瞭解這裏發生了什麼,除了一個重要的事情。這項工作似乎是在run()函數中完成的。在這個例子中run()只打印一條線和信號到隊列中,表明工作已經完成。

如果我不得不返回一些處理過的數據?我想過把它緩存在一個全局變量中,並在稍後訪問它,但這似乎不是正確的方法。

有什麼建議?

也許我應該clearify:我的直覺告訴我self.queue.task_done()之後添加return processed_datarun(),但我想不通的地方,以趕上回報,因爲它不是明顯,我在那裏run()被調用。

#!/usr/bin/env python 
import Queue 
import threading 
import urllib2 
import time 

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com", 
"http://ibm.com", "http://apple.com"] 

queue = Queue.Queue() 

class ThreadUrl(threading.Thread): 
    """Threaded Url Grab""" 
    def __init__(self, queue): 
     threading.Thread.__init__(self) 
     self.queue = queue 

    def run(self): 
     while True: 
      #grabs host from queue 
      host = self.queue.get() 

      #grabs urls of hosts and prints first 1024 bytes of page 
      url = urllib2.urlopen(host) 
      print url.read(1024) 

      #signals to queue job is done 
      self.queue.task_done() 

start = time.time() 
def main(): 

    #spawn a pool of threads, and pass them queue instance 
    for i in range(5): 
     t = ThreadUrl(queue) 
     t.setDaemon(True) 
     t.start() 

    #populate queue with data 
    for host in hosts: 
     queue.put(host) 

    #wait on the queue until everything has been processed  
    queue.join() 

main() 
print "Elapsed Time: %s" % (time.time() - start) 

回答

3

不能從run返回一個值,在任何情況下,通常有多個項目在每個線程來處理,所以你不想處理一個值後都返回(見每個線程中的while循環)。

我要麼使用另一個隊列返回結果:

queue = Queue.Queue() 
out_queue = Queue.Queue() 

class ThreadUrl(threading.Thread): 
    ... 
    def run(self): 
     while True: 
      #grabs host from queue 
      host = self.queue.get() 

      #grabs urls of hosts and saves first 1024 bytes of page 
      url = urllib2.urlopen(host) 
      out_queue.put(url.read(1024)) 

      #signals to queue job is done 
      self.queue.task_done() 

... 

def main(): 
    ... 
    #populate queue with data 
    for host in hosts: 
     queue.put(host) 

    #don't have to wait until everything has been processed if we don't want to 

    for _ in range(len(hosts)): 
     first_1k = out_queue.get() 
     print first_1k 

或將結果存儲在同一隊列:

class WorkItem(object): 
    def __init__(self, host): 
     self.host = host 

class ThreadUrl(threading.Thread): 
    ... 
    def run(self): 
     while True: 
      #grabs host from queue 
      work_item = self.queue.get() 
      host = work_item.host 

      #grabs urls of hosts and saves first 1024 bytes of page 
      url = urllib2.urlopen(host) 
      work_item.first_1k = url.read(1024) 

      #signals to queue job is done 
      self.queue.task_done() 

... 

def main(): 
    ... 
    #populate queue with data 
    work_items = [WorkItem(host) for host in hosts] 
    for item in work_items: 
     queue.put(item) 

    #wait on the queue until everything has been processed  
    queue.join() 

    for item in work_items: 
     print item.first_1k 
0

問題與使用該隊列的方法是:在順序線程可能完成的是隨機的。因此,隊列項目可能不一定反映該特定位置的結果。 在這個例子中,如果google.com在yahoo.com之前完成,那麼該隊列在yahoo數據之前有谷歌數據,所以當檢索它時,結果是不正確的。

相關問題