0
爲了獲得更好的技能,我製作了這個腳本,所以它需要一個網站列表,並創建一個字典,並採取每個網站,並抓取它找到「conatct-us」頁面,但我看到的是我的腳本,當網站的一個不工作,所以我試圖做的就是逃脫的網站,並繼續向他人 這裏停止是我的代碼:當它返回時在列表中轉義元素錯誤
import requests
from bs4 import BeautifulSoup
from urlparse import urlparse
from mechanize import Browser
import re
headers = [('User-Agent','Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0')]
urls = 'http://www.officialusa.com/stateguides/chambers/georgia.html'
links_dict = []
response = requests.get(urls, headers)
bsObj = BeautifulSoup(response.text,'lxml')
for tag in bsObj.find_all('li'):
links_dict.append(tag.a.get('href'))
for ink in links_dict:
r = requests.get(ink)
#get domain name only
parsed_uri = urlparse(ink)
domain = parsed_uri.netloc
br = Browser()
br.set_handle_robots(False)
br.addheaders = headers
try:
br.open(str(ink))
for link in br.links():
siteMatch = re.compile(ink).search(link.url)
print link.url
except:
pass
一切是好與其他鏈接
這是錯誤:
Traceback (most recent call last):
File "/home/qunix/PycharmProjects/challange/crawel.py", line 20, in <module>
r = requests.get(ink)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 487, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.quitmangeorgia.org', port=80): Max retries exceeded with url:/(Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7facf68cca50>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
謝謝你!
謝謝,最後它的工作。只是一個小編輯我用 requests.ConnectionError: –
@AmineAmhoume完全沒問題,我沒有編輯器打開來檢查你是否必須完全限定錯誤或不是,但很高興你解決了它 – Dillanm