2017-10-14 55 views
1

爲什麼在使用threads時運行下面的代碼,但在使用時出現異常multiprocessing被使用?爲什麼multiprocessing.pool.map引發PicklingError(Encoding)?

from multiprocessing import Pool 
from multiprocessing.dummy import Pool as ThreadsPool 
import urllib2 

urls = [ 
    'http://www.python.org', 
    'http://www.python.org/about/', 
    'http://www.python.org/doc/', 
    'http://www.python.org/download/'] 

def use_threads(): 

    pool = ThreadsPool(4) 
    results = pool.map(urllib2.urlopen, urls) 
    pool.close() 
    pool.join() 

    print [len(x.read()) for x in results] 

def use_procs(): 

    p_pool = Pool(4) 
    p_results = p_pool.map(urllib2.urlopen, urls) 
    p_pool.close() 
    p_pool.join() 

    print 'using procs instead of threads' 
    print [len(x.read()) for x in p_results] 

if __name__ == '__main__': 
    use_procs() 

唯一的例外是

Traceback (most recent call last): 
    File "pools.py", line 39, in <module> 
    use_procs() 
    File "pools.py", line 31, in use_procs 
    p_results = p_pool.map(urllib2.urlopen, urls) 
    File "/usr/lib64/python2.7/multiprocessing/pool.py", line 250, in map 
    return self.map_async(func, iterable, chunksize).get() 
    File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get 
    raise self._value 
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<addinfourl at 35286624 whose fp = <socket._fileobject object at 0x2198ad0>>]'. Reason: 'PicklingError("Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed",)' 

我知道有間進程和線程如何相互溝通的差異。爲什麼pickle網站內容失敗?我如何設置編碼來解決這個問題?

+1

那錯誤,因爲你試圖序列套接字對象,這是不可能的 –

+0

有一個想法,我應該通過什麼樣的功能映射到獲得所需的輸出? (讀取對象的執行) – Vinny

回答

3

問題ISN編碼錯誤,這是因爲酸洗錯誤,因爲urllib2.urlopen()返回的結果是一個不可取消的對象(根據我在代碼中獲得的錯誤消息中顯示的稍微不同的原因,一個_ssl._SSLSocket)。爲了解決這個問題,可以通過在打開url之後讀取數據來限制返回對象的使用情況,如下所示。這可能意味着更多的數據需要在進程之間傳遞。

# Added. 
def get_data(url): 

    soc = urllib2.urlopen(url) 
    return soc.read() 

def use_procs(): 

    p_pool = Pool(4) 
# p_results = p_pool.map(urllib2.urlopen, urls) 
    p_results = p_pool.map(get_data, urls) 
    p_pool.close() 
    p_pool.join() 

    print 'using procs instead of threads' 
# print [len(x.read()) for x in results] 
    print [len(x) for x in p_results] 

輸出:提出

using procs instead of threads 
[49062, 41616, 40086, 101224] 
2

正如我已經提到的 - 由於您試圖在進程之間傳遞套接字對象而引發錯誤。您必須更改腳本邏輯弄成這個樣子:

from multiprocessing.pool import Pool 
from multiprocessing.pool import ThreadPool 
import urllib2 

urls = [ 
    'http://www.python.org', 
    'http://www.python.org/about/', 
    'http://www.python.org/doc/', 
    'http://www.python.org/download/' 
] 

def worker(url): 
    return urllib2.urlopen(url).read() # string returned 

def use_threads(): 

    pool = ThreadPool(4) 
    results = pool.map(worker, urls) 
    pool.close() 
    pool.join() 

    print([len(x) for x in results]) 

def use_procs(): 

    p_pool = Pool(4) 
    p_results = p_pool.map(worker, urls) 
    p_pool.close() 
    p_pool.join() 

    print('using procs instead of threads') 
    print([len(x) for x in p_results]) 

if __name__ == '__main__': 
    use_procs() 

順便說一句:你可以做池工廠,並挑選從它,而不是在use_threads和​​複製代碼池:

from multiprocessing.pool import Pool 
from multiprocessing.pool import ThreadPool 
import urllib2 

urls = [ 
    'http://www.python.org', 
    'http://www.python.org/about/', 
    'http://www.python.org/doc/', 
    'http://www.python.org/download/' 
] 


def worker(url): 
    return urllib2.urlopen(url).read() 


def pool_factory(key, n): 
    if key == 'proc': 
     print('using procs instead of threads') 
     return Pool(n) 
    else: 
     return ThreadPool(n) 


def main(): 

    pool = pool_factory('proc', 4) # change `proc` to anything for using ThreadPool 
    results = pool.map(worker, urls) 
    pool.close() 
    pool.join() 
    print([len(x) for x in results]) 


if __name__ == '__main__': 
    main() 
+0

感謝您的輸入。你對返回的字符串是正確的。我沒有創建工廠方法,因爲此代碼僅用於練習,不用於其他代碼:-) – Vinny

相關問題