我的代碼存在問題。Python在一個循環中下載多個文件
#!/usr/bin/env python3.1
import urllib.request;
# Disguise as a Mozila browser on a Windows OS
userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)';
URL = "www.example.com/img";
req = urllib.request.Request(URL, headers={'User-Agent' : userAgent});
# Counter for the filename.
i = 0;
while True:
fname = str(i).zfill(3) + '.png';
req.full_url = URL + fname;
f = open(fname, 'wb');
try:
response = urllib.request.urlopen(req);
except:
break;
else:
f.write(response.read());
i+=1;
response.close();
finally:
f.close();
當我創建urllib.request.Request對象(稱爲req)時,問題似乎出現了。我用一個不存在的URL創建它,但後來我改變了它應該是的網址。我這樣做,以便我可以使用相同的urllib.request.Request對象,而不必在每次迭代中創建新的。在python中可能有一種機制可以完成,但我不確定它是什麼。
EDIT 錯誤信息是:
>>> response = urllib.request.urlopen(req);
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.1/urllib/request.py", line 121, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python3.1/urllib/request.py", line 356, in open
response = meth(req, response)
File "/usr/lib/python3.1/urllib/request.py", line 468, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.1/urllib/request.py", line 394, in error
return self._call_chain(*args)
File "/usr/lib/python3.1/urllib/request.py", line 328, in _call_chain
result = func(*args)
File "/usr/lib/python3.1/urllib/request.py", line 476, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
編輯2:我的解決方案如下。
import urllib.request;
# Disguise as a Mozila browser on a Windows OS
userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)';
# Counter for the filename.
i = 0;
while True:
fname = str(i).zfill(3) + '.png';
URL = "www.example.com/img" + fname;
f = open(fname, 'wb');
try:
req = urllib.request.Request(URL, headers={'User-Agent' : userAgent});
response = urllib.request.urlopen(req);
except:
break;
else:
f.write(response.read());
i+=1;
response.close();
finally:
f.close();
什麼是錯誤信息?此外,python不需要分號結束一行。 – Dikei 2012-03-28 02:37:03
我已添加錯誤消息。我知道我不需要分號但我更願意添加它們。網址和文件存在。唯一的問題是,我用無效的url創建req對象,然後在使用req之前更正了url。這似乎是導致錯誤。 – s5s 2012-03-28 02:41:08
是的。該網址是有效的。這就是它導致問題的原因。我也可以訪問url,wget它並用Python下載它,如果我沒有循環,所以我在創建它時將req對象中的url設置爲正確。 – s5s 2012-03-28 02:44:11