我寫了一個例子,希望你能找到幫助。它由一個urls.txt
文件組成,該文件列出了要處理的作業以及使用多個進程下載的xdownloader.py文件。我在一臺運行Python 2.7的Linux機器上測試了它。
xdownloader.py
from multiprocessing import Process
import urllib2
def main():
try:
# Read URLs/Destinations from a file
jobs = []
with open("urls.txt","r") as ifile:
for line in ifile:
jobs.append(line.split(" "))
# Create a process list to keep track of running processes
process_list = []
# Iterate through our jobs list
for url, save_to in jobs:
# Create a new process that runs function 'download'
p = Process(target=download, args=(url, save_to))
# Save it to a list
process_list.append(p)
# Start the process
p.start()
except KeyboardInterrupt:
print("Received keyboard interrupt (ctrl+c). Exiting...")
finally:
# Wait for all processes to finish before exiting
for process in process_list:
# Wait for this process to finish
process.join()
print("All processes finished successfully!")
def download(url, destination):
# Open a request
request = urllib2.urlopen(url)
# Read and save the webpage data to a file
with open(destination, "w+") as save_file:
print("Downloading {0}".format(url))
save_file.write(request.read())
if __name__=="__main__":
main()
urls.txt
http://google.com google.html
http://yahoo.com yahoo.html
http://news.google.com news.google.html
http://reddit.com reddit.html
http://news.ycombinator.com news.ycombinator.html
運行python xdownloader.py
我得到:
[email protected] ~ $ python xdownloader.py
Downloading http://news.ycombinator.com
Downloading http://reddit.com
Downloading http://news.google.com
Downloading http://google.com
Done downloading http://google.com
Done downloading http://news.ycombinator.com
Done downloading http://reddit.com
Downloading http://yahoo.com
Done downloading http://news.google.com
Done downloading http://yahoo.com
你可以看到,作業異步運行。有些工作開始時間較早,但比其他工作晚。 (我在看你news.google.com!)如果這個例子不符合你的需求,請在評論中告訴我。
是否希望同時處理這兩個請求,或者您是否特意要在第一個請求結束之前停止第二個請求? – 2013-01-16 05:28:50
@Mike,同時處理兩者都不會有問題。實際上這將是一件好事。但我的印象是,你不能用像python這樣的阻塞語言來做到這一點。 – user1624005
你可能會想到GIL(全局解釋器鎖)。您可以使用'multiprocessing'而不是'threading'來繞過它。從您所描述的內容來看,對每個下載實施單獨的流程應該不會太困難。 – 2013-01-16 05:39:53