2016-12-04 79 views
0

我有一個腳本從urls下載圖像,但我想平行化,否則需要幾個小時。有了這個代碼:枚舉循環中的多處理

import requests 
from math import floor, log10 
import urllib 
import time 
import multiprocessing 

with open('images.csv', 'r') as f: 
    images = f.readlines() 

num_position = floor(log10(len(images)) + 1) 

a = time.time() 

for i, image in enumerate(images[1:10]): 
    if (i+1) % 1000 == 0: 
     print('Downloading {} image'.format(i+1)) 
# a = time.time() 
    with open(str(i).zfill(num_position)+'a.jpg', 'wb') as file: 
     try: 
      writing = file.write(requests.get(image.split(',')[2]).content) 
      p = multiprocessing.Process(target=writing, args=(image,)) 
      p.start() 
      p.join()  
     except: 
      print('Skipping an image!') 
      pass 
b = time.time() 
print('multiple process -- {}'.format(b-a)) 

我得到一個錯誤:

Process Process-9: 
Traceback (most recent call last): 
    File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap 
    self.run() 
    File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run 
    self._target(*self._args, **self._kwargs) 
TypeError: 'int' object is not callable 
  1. 爲什麼我會得到一個錯誤,但任務仍是完成和代碼不破? (我的意思是這件作品在嘗試:)
  2. 什麼是最簡單的方法來包括這種平行?

回答

1

你得到的錯誤,因爲據我所知這條線

writing = file.write(requests.get(image.split(',')[2]).content) 

具有整數類型的輸出。 write返回寫入字符的數量,它等於圖像的字符串表示的長度。現在您將其分配給變量writing - >writing成爲一個數字。

p = multiprocessing.Process(target=writing, args=(image,)) 

調用writing爲目標函數,它引發錯誤,因爲你不調用一個函數,但整型writing(不可調用)。該代碼起作用,因爲您的工作人員沒有任何事情要做,立即關閉並且文件已經寫好。

爲了讓事情順利進行,您必須定義一個函數,它將圖像作爲參數,也許是文件名。您稍後在安裝您的工作人員時調用此功能。類似的東西:

def write_file(image, filename): 
    filestream = open(filename, mode="w") 
    filestream.write(requests.get(image.split(',')[2]).content) 
    filestream.close() 

而在你的應用程序

p = multiprocessing.Process(target=write_file, args=(image, filename,)) 

然而,這僅僅是寫作部分。如果你想在單獨的任務中進行下載,那麼你必須把代碼放到你單獨的函數中。

def download_write(urls): 
    for url in iter(urls.get, 'STOP'): 
     #download code here# 
     filestream = open(filename, mode="w") 
     filestream.write(requests.get(image.split(',')[2]).content) 
     filestream.close() 

而且你的主要應用:

list_urls = [] # your list of urls to download 
urls = Queue() 
for element in list_urls: 
    urls.put(element) 
p = multiprocessing.Process(target=download_write, args=(urls,)) 
urls.put("STOP") #signals end of tasks for your workers 
p.start() #start worker 
p.join() #wait for worker to finish 
+0

許多感謝詳細的解答! – jojo