如果bit.ly
回報404
非縮短鏈接HTTP代碼:
#!/usr/bin/env python
from httplib import HTTPConnection
from urlparse import urlsplit
urls = ["http://bit.ly/NKEIV8", "http://bit.ly/1niCdh9"]
for url in urls:
host, path = urlsplit(url)[1:3]
conn = HTTPConnection(host)
conn.request("HEAD", path)
r = conn.getresponse()
if r.status != 404:
print("{r.status} {url}".format(**vars()))
無關:爲加快檢查速度,您可以使用多個線程:
#!/usr/bin/env python
from httplib import HTTPConnection
from multiprocessing.dummy import Pool # use threads
from urlparse import urlsplit
def getstatus(url):
try:
host, path = urlsplit(url)[1:3]
conn = HTTPConnection(host)
conn.request("HEAD", path)
r = conn.getresponse()
except Exception as e:
return url, None, str(e) # error
else:
return url, r.status, None
p = Pool(20) # use 20 concurrent connections
for url, status, error in p.imap_unordered(getstatus, urls):
if status != 404:
print("{status} {url} {error}".format(**vars()))
來源
2014-03-02 22:40:44
jfs
你想獲得頁面的內容? –
我希望能夠在不加載頁面內容的情況下檢查鏈接,但如果這是唯一的方法,那麼可以這樣做 – Scherf
檢查res.status(例如,301是重定向) – jfs