2016-03-04 63 views
1

最近我在Python 3.4中編寫了一個多進程代碼來下載一些圖像,它的工作速度非常快,然後我得到以下錯誤,不能開始程序了。Python 3.4多進程拋出TypeError(「無法序列化'_io.BufferedReader'對象,)

Traceback (most recent call last): 
    File "multiprocessing_d.py", line 23, in <module> 
    main() 
    File "multiprocessing_d.py", line 16, in main 
    p.map(download, lines) 
    File "/usr/local/lib/python3.4/multiprocessing/pool.py", line 260, in map 
    return self._map_async(func, iterable, mapstar, chunksize).get() 
    File "/usr/local/lib/python3.4/multiprocessing/pool.py", line 608, in get 
    raise self._value 
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f1e047f32e8>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object",)' 

我的代碼如下
download_helper.py

 import sys 
     import os 
     from pathlib import Path 

     url_prefix = r"Some prefix" 

     def setup_download_dir(dictionary): 
      download_dir = Path(dictionary) 
      if not download_dir.exists(): 
       download_dir.mkdir() 
      return dictionary 

     def download_link(dictionary, line): 
      from urllib.request import urlretrieve 
      itemid = line.split()[0].decode() 
      link = line.split()[1].decode() 
      if (link.startswith("http")): 
       image_url = link 
      else: 
       image_url = url_prefix + link 
      if os.path.isfile(dictionary + "/" + itemid + ".jpg"): 
       #print("Already have " + itemid + ".jpg") 
       pass 
      else: 
       urlretrieve(image_url, dictionary + "/" + itemid + ".jpg") 

multiprocessing_d.py

 from functools import partial 
     from multiprocessing.pool import Pool 
     import sys 
     from time import time 
     from download_helper import setup_download_dir, download_link 

     def main(): 
      file_path = sys.argv[1] 
      dic_path = sys.argv[2] 
      download_dir = setup_download_dir(dic_path) 
      download = partial(download_link, download_dir) 
      with open(file_path, 'rb') as f: 
       lines = f.readlines() 
       ts = time() 
       p = Pool(processes=16, maxtasksperchild=1) 
       p.map(download, lines) 
       p.close() 
       p.join() 
       print('Took {}s'.format(time() - ts)) 
       f.close() 

     if __name__ == "__main__": 
      main() 

我試着在網上搜索,但沒有發現多少有用的信息。我懷疑在urlretrieve中可能會出現一些異常,但我不知道如何調試它。任何意見或建議,將不勝感激!

詹姆斯

+1

問題sovled,有一些環節是死的,urlretrieve將返回HTTPError,不能被序列化。添加異常處理程序可以解決問題 –

回答

-3

我不是最好的編碼器,不知道很多關於這一點,但你可以嘗試:

from functools import partial 
    from multiprocessing.pool import Pool 
    import sys 
    from time import time 
    from download_helper import setup_download_dir, download_link 

    def main(): 
     try: 
      file_path = sys.argv[1] 
      dic_path = sys.argv[2] 
      download_dir = setup_download_dir(dic_path) 
      download = partial(download_link, download_dir) 
      with open(file_path, 'rb') as f: 
       lines = f.readlines() 
       ts = time() 
       p = Pool(processes=16, maxtasksperchild=1) 
       p.map(download, lines) 
       p.close() 
       p.join() 
       print('Took {}s'.format(time() - ts)) 
       f.close() 
     except: 
      pass 

    if __name__ == "__main__": 
     main() 

如果這不起作用,那麼而不只是,除了:,使用TypeError除外。 否則,我不知道,對不起。

好運

相關問題