所以我有以下代碼驗證某些網址是正確的,我只需要200響應所以我做了一個腳本正常工作,但它的速度太慢(:urllib2.Request檢查URL可達
import urllib2
import string
def my_range(start, end, step):
while start <= end:
yield start
start += step
url = 'http://exemple.com/test/'
y = 1
for x in my_range(1, 5, 1):
y =y+1
url+=str(y)
print url
req = urllib2.Request(url)
try:
resp = urllib2.urlopen(req)
except urllib2.URLError, e:
if e.code == 404:
print "404"
else:
print "not 404"
else:
print "200"
url = 'http://exemple.com/test/'
body = resp.read()
在這個例子中我假設我有以下目錄在我的本地主機與這導致
http://exemple.com/test/2
200
http://exemple.com/test/3
200
http://exemple.com/test/4
404
http://exemple.com/test/5
404
http://exemple.com/test/6
404
所以我搜索如何做到這一點更快,我發現這個代碼:
import urllib2
request = urllib2.Request('http://www.google.com/')
response = urllib2.urlopen(request)
if response.getcode() == 200:
print "200"
它似乎更快,但是當我有404像(http://www.google.com/111) 測試它給了我這樣的結果:
Traceback (most recent call last):
File "C:\Python27\res.py", line 3, in <module>
response = urllib2.urlopen(request)
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 438, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
任何想法的傢伙? 並非常感謝您的幫助:)
爲什麼不使用try/except語句?這應該可以解決問題。另見:http://stackoverflow.com/questions/1947133/urllib2-urlopen-vs-urllib-urlopen-urllib2-throws-404-while-urllib-works-w – oliver13
我開始學習蟒蛇5個小時前我只有大聲笑在其他語言有一點經驗一些解釋可能會有所幫助,謝謝很多:) – Ez0r