2017-05-09 22 views
0

爲了獲得更好的技能,我製作了這個腳本,所以它需要一個網站列表,並創建一個字典,並採取每個網站,並抓取它找到「conatct-us」頁面,但我看到的是我的腳本,當網站的一個不工作,所以我試圖做的就是逃脫的網站,並繼續向他人 這裏停止是我的代碼:當它返回時在列表中轉義元素錯誤

import requests 
from bs4 import BeautifulSoup 
from urlparse import urlparse 
from mechanize import Browser 
import re 
headers = [('User-Agent','Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0')] 

urls = 'http://www.officialusa.com/stateguides/chambers/georgia.html' 
links_dict = [] 



response = requests.get(urls, headers) 
bsObj = BeautifulSoup(response.text,'lxml') 
for tag in bsObj.find_all('li'): 
     links_dict.append(tag.a.get('href')) 


for ink in links_dict: 
       r = requests.get(ink) 
       #get domain name only 
       parsed_uri = urlparse(ink) 
       domain = parsed_uri.netloc 
       br = Browser() 
       br.set_handle_robots(False) 
       br.addheaders = headers 
       try: 
        br.open(str(ink)) 
        for link in br.links(): 
          siteMatch = re.compile(ink).search(link.url) 
          print link.url 
       except: 
        pass 

一切是好與其他鏈接
這是錯誤:

Traceback (most recent call last): 
    File "/home/qunix/PycharmProjects/challange/crawel.py", line 20, in <module> 
    r = requests.get(ink) 
    File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get 
    return request('get', url, params=params, **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request 
    return session.request(method=method, url=url, **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 488, in request 
    resp = self.send(prep, **send_kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 609, in send 
    r = adapter.send(request, **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 487, in send 
    raise ConnectionError(e, request=request) 
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.quitmangeorgia.org', port=80): Max retries exceeded with url:/(Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7facf68cca50>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)) 

謝謝你!

回答

0

嘗試包裝線

r = requests.get(ink) 

在嘗試捕捉,像這樣:

try: 
    r = requests.get(ink) 
except ConnectionError: 
    continue 

這將意味着,如果調用requests.get拋出一個ConnectionError因爲它在你的榜樣,它會轉到列表中的下一個網站。

+1

謝謝,最後它的工作。只是一個小編輯我用 requests.ConnectionError: –

+0

@AmineAmhoume完全沒問題,我沒有編輯器打開來檢查你是否必須完全限定錯誤或不是,但很高興你解決了它 – Dillanm

相關問題