示例urllib3和蟒蛇中的線程

我想在簡單的線程中使用urllib3來獲取幾個wiki頁面。腳本將示例urllib3和蟒蛇中的線程

爲每個線程創建1個連接（我不明白爲什麼）並永久掛起。任何提示，建議或urllib3的簡單的例子，線程

import threadpool 
from urllib3 import connection_from_url 

HTTP_POOL = connection_from_url(url, timeout=10.0, maxsize=10, block=True) 

def fetch(url, fiedls): 
    kwargs={'retries':6} 
    return HTTP_POOL.get_url(url, fields, **kwargs) 

pool = threadpool.ThreadPool(5) 
requests = threadpool.makeRequests(fetch, iterable) 
[pool.putRequest(req) for req in requests]

@倫納特的劇本得到這個錯誤：

http://en.wikipedia.org/wiki/2010-11_Premier_LeagueTraceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run 
http://en.wikipedia.org/wiki/List_of_MythBusters_episodeshttp://en.wikipedia.org/wiki/List_of_Top_Gear_episodes http://en.wikipedia.org/wiki/List_of_Unicode_characters result = request.callable(*request.args, **request.kwds) 
    File "crawler.py", line 9, in fetch 
    print url, conn.get_url(url) 
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url' 
Traceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run 
    result = request.callable(*request.args, **request.kwds) 
    File "crawler.py", line 9, in fetch 
    print url, conn.get_url(url) 
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url' 
Traceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run 
    result = request.callable(*request.args, **request.kwds) 
    File "crawler.py", line 9, in fetch 
    print url, conn.get_url(url) 
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url' 
Traceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run 
    result = request.callable(*request.args, **request.kwds) 
    File "crawler.py", line 9, in fetch 
    print url, conn.get_url(url) 
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'

加入import threadpool; import urllib3和tpool = threadpool.ThreadPool(4) @ user318904的代碼後得到這個錯誤：

Traceback (most recent call last): 
    File "crawler.py", line 21, in <module> 
    tpool.map_async(fetch, urls) 
AttributeError: ThreadPool instance has no attribute 'map_async'

來源

2010-09-16 Joey

很明顯，它會爲每個線程創建一個連接，每個線程應該怎樣才能獲取一個頁面？並且您嘗試使用同一個連接，由一個網址製作，適用於所有網址。這不可能是你想要的。

此代碼工作得很好：

import threadpool 
from urllib3 import connection_from_url 

def fetch(url): 
    kwargs={'retries':6} 
    conn = connection_from_url(url, timeout=10.0, maxsize=10, block=True) 
    print url, conn.get_url(url) 
    print "Done!" 

pool = threadpool.ThreadPool(4) 
urls = ['http://en.wikipedia.org/wiki/2010-11_Premier_League', 
     'http://en.wikipedia.org/wiki/List_of_MythBusters_episodes', 
     'http://en.wikipedia.org/wiki/List_of_Top_Gear_episodes', 
     'http://en.wikipedia.org/wiki/List_of_Unicode_characters', 
     ] 
requests = threadpool.makeRequests(fetch, urls) 

[pool.putRequest(req) for req in requests] 
pool.wait()

來源

2011-01-13 12:20:42

我用的是這樣的：

#excluding setup for threadpool etc 

upool = urllib3.HTTPConnectionPool('en.wikipedia.org', block=True) 

urls = ['/wiki/2010-11_Premier_League', 
     '/wiki/List_of_MythBusters_episodes', 
     '/wiki/List_of_Top_Gear_episodes', 
     '/wiki/List_of_Unicode_characters', 
     ] 

def fetch(path): 
    # add error checking 
    return pool.get_url(path).data 

tpool = ThreadPool() 

tpool.map_async(fetch, urls) 

# either wait on the result object or give map_async a callback function for the results

來源

2011-04-04 08:29:09 user318904

線程編程是很難的，所以我寫了workerpool讓你在做什麼容易。

更具體而言，請參見Mass Downloader示例。

要做到同樣的事情urllib3，它看起來是這樣的：

import urllib3 
import workerpool 

pool = urllib3.connection_from_url("foo", maxsize=3) 

def download(url): 
    r = pool.get_url(url) 
    # TODO: Do something with r.data 
    print "Downloaded %s" % url 

# Initialize a pool, 5 threads in this case 
pool = workerpool.WorkerPool(size=5) 

# The ``download`` method will be called with a line from the second 
# parameter for each job. 
pool.map(download, open("urls.txt").readlines()) 

# Send shutdown jobs to all threads, and wait until all the jobs have been completed 
pool.shutdown() 
pool.wait()

對於更復雜的代碼，看看workerpool.EquippedWorker（和the tests here例如使用）。你可以讓游泳池成爲你通過的toolbox。

來源

2011-07-27 01:58:52 shazow

示例urllib3和蟒蛇中的線程

回答

相關問題