Python在一個循環中下載多個文件

我的代碼存在問題。Python在一個循環中下載多個文件

#!/usr/bin/env python3.1 

import urllib.request; 

# Disguise as a Mozila browser on a Windows OS 
userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'; 

URL = "www.example.com/img"; 
req = urllib.request.Request(URL, headers={'User-Agent' : userAgent}); 

# Counter for the filename. 
i = 0; 

while True: 
    fname = str(i).zfill(3) + '.png'; 
    req.full_url = URL + fname; 

    f = open(fname, 'wb'); 

    try: 
     response = urllib.request.urlopen(req); 
    except: 
     break; 
    else: 
     f.write(response.read()); 
     i+=1; 
     response.close(); 
    finally: 
     f.close();

當我創建urllib.request.Request對象（稱爲req）時，問題似乎出現了。我用一個不存在的URL創建它，但後來我改變了它應該是的網址。我這樣做，以便我可以使用相同的urllib.request.Request對象，而不必在每次迭代中創建新的。在python中可能有一種機制可以完成，但我不確定它是什麼。

EDIT 錯誤信息是：

>>> response = urllib.request.urlopen(req); 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/lib/python3.1/urllib/request.py", line 121, in urlopen 
    return _opener.open(url, data, timeout) 
    File "/usr/lib/python3.1/urllib/request.py", line 356, in open 
    response = meth(req, response) 
    File "/usr/lib/python3.1/urllib/request.py", line 468, in http_response 
    'http', request, response, code, msg, hdrs) 
    File "/usr/lib/python3.1/urllib/request.py", line 394, in error 
    return self._call_chain(*args) 
    File "/usr/lib/python3.1/urllib/request.py", line 328, in _call_chain 
    result = func(*args) 
    File "/usr/lib/python3.1/urllib/request.py", line 476, in http_error_default 
    raise HTTPError(req.full_url, code, msg, hdrs, fp) 
urllib.error.HTTPError: HTTP Error 403: Forbidden

編輯2：我的解決方案如下。

import urllib.request; 

# Disguise as a Mozila browser on a Windows OS 
userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'; 

# Counter for the filename. 
i = 0; 

while True: 
    fname = str(i).zfill(3) + '.png'; 
    URL = "www.example.com/img" + fname; 

    f = open(fname, 'wb'); 

    try: 
     req = urllib.request.Request(URL, headers={'User-Agent' : userAgent}); 
     response = urllib.request.urlopen(req); 
    except: 
     break; 
    else: 
     f.write(response.read()); 
     i+=1; 
     response.close(); 
    finally: 
     f.close();

來源

2012-03-28 s5s

什麼是錯誤信息？此外，python不需要分號結束一行。 – Dikei 2012-03-28 02:37:03

我已添加錯誤消息。我知道我不需要分號但我更願意添加它們。網址和文件存在。唯一的問題是，我用無效的url創建req對象，然後在使用req之前更正了url。這似乎是導致錯誤。 – s5s 2012-03-28 02:41:08

是的。該網址是有效的。這就是它導致問題的原因。我也可以訪問url，wget它並用Python下載它，如果我沒有循環，所以我在創建它時將req對象中的url設置爲正確。 – s5s 2012-03-28 02:44:11

urllib2是適合於小腳本，只需要做一個或兩個網絡的互動，但如果你正在做更多的工作，你可能會發現，無論是urllib3，或requests（這不是巧合是建立在前者），可能更適合您的需求。你的具體的例子可能看起來像：

from itertools import count 
import requests 

HEADERS = {'user-agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} 
URL = "http://www.example.com/img%03d.png" 

# with a session, we get keep alive 
session = requests.session() 

for n in count(): 
    full_url = URL % n 
    ignored, filename = URL.rsplit('/', 1) 

    with file(filename, 'wb') as outfile: 
     response = session.get(full_url, headers=HEADERS) 
     if not response.ok: 
      break 
     outfile.write(response.content)

編輯：如果你可以使用普通的HTTP身份驗證（其中403 Forbidden響應強烈建議），那麼你可以添加到requests.get與auth參數，如：

response = session.get(full_url, headers=HEADERS, auth=('username','password))

來源

2012-03-28 02:52:08 SingleNegationElimination

我喜歡這個答案，而不是僅僅修復一個OP的錯誤，你實際上展示了一個更好的方法，從而解決了他和其他人的問題。 – Mig 2012-03-28 03:07:47

它知道這是從原來的帖子很長一段時間，但文件名應讀取'忽略，文件名= full_url.rsplit（'/'，1）'而不是'忽略，文件名= URL.rsplit（'/'，1 ）'。否則，文件名將是'img％03d.png'。 – Marius 2016-11-14 21:07:24

-2

當您收到一個異常不要打破：也許應該，因爲我知道它會工作在開始這樣做了。更改

except: 
    break

到

except: 
    #Probably should log some debug information here. 
    pass

這將跳過所有有問題的請求，這樣一個不帶下來的全過程。

來源

2012-03-28 02:47:55 Dikei

這將大大改變邏輯。他很可能不希望永遠循環。 – SingleNegationElimination 2012-03-28 02:49:15

我正在使用異常作爲終止循環的方式。通過將導致無限循環。我不知道有多少文件，所以我正在下載，直到遇到異常。 – s5s 2012-03-28 02:50:08

雖然不會阻止服務器進行調節。 – 2012-03-28 02:50:18

如果您想在每個請求中使用自定義用戶代理，那麼可以創建子類FancyURLopener。

下面是一個例子：http://wolfprojects.altervista.org/changeua.php

來源

2012-03-28 02:57:53 W1N9Zr0

Python在一個循環中下載多個文件

回答

相關問題