2012-10-19 95 views
0

我正在使用此代碼來獲取網站HTML內容,Python 3中獲取HTML內容

import urllib.request 
import lxml.html as lh 
req= urllib.request.Request("http://www.ip-adress.com/ip_tracer/157.123.22.11", 
headers={'User-Agent' : "Magic Browser"}) 
html = urllib.request.urlopen(req).read() 
doc = lh.fromstring(html) 
print (''.join(doc.xpath('.//*[@class="odd"]')[-1].text_content().split())) 

我想要得到的組織:天頂數據系統。 但它顯示了一些錯誤

Traceback (most recent call last): 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 1135, in do_open 
h.request(req.get_method(), req.selector, req.data, headers) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 967, in request 
self._send_request(method, url, body, headers) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 1005, in _send_request 
self.endheaders(body) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 963, in endheaders 
self._send_output(message_body) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 808, in _send_output 
self.send(msg) 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 746, in send 
self.connect() 
File "/usr/local/python3.2.3/lib/python3.2/http/client.py", line 724, in connect 
self.timeout, self.source_address) 
File "/usr/local/python3.2.3/lib/python3.2/socket.py", line 404, in create_connection 
raise err 
File "/usr/local/python3.2.3/lib/python3.2/socket.py", line 395, in create_connection 
sock.connect(sa) 
socket.error: [Errno 111] Connection refused 

在處理上述異常,另一個異常:

Traceback (most recent call last): 
File "ext.py", line 4, in <module> 
html = urllib.request.urlopen(req).read() 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 138, in urlopen 
return opener.open(url, data, timeout) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 369, in open 
response = self._open(req, data) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 387, in _open 
'_open', req) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 347, in _call_chain 
result = func(*args) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 1155, in http_open 
return self.do_open(http.client.HTTPConnection, req) 
File "/usr/local/python3.2.3/lib/python3.2/urllib/request.py", line 1138, in do_open 
raise URLError(err) 
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>} 

如何解決它。謝謝,

+1

'KeyboardInterrupt'意味着你按下'CTRL-C'和停止的過程。 – Blender

+0

@Blender:謝謝,我已經改變了錯誤 – AntiGMO

+1

您可以使用瀏覽器/或通過代理訪問網站嗎?你有防火牆嗎?你的IP可能被禁止。 – jfs

回答

0

基本上,拒絕連接意味着只有註冊用戶被允許訪問該頁面或服務器在大量維護或類似的原因。

從你上面的代碼,如果你要處理錯誤,你可以嘗試使用try和除了像下面的代碼:

try: 
    req= urllib.request.Request("http://www.ip-adress.com/ip_tracer/157.123.22.11",headers={'User-Agent' : "Magic Browser"}) 
    html = urllib.request.urlopen(req).read() 
    doc = lh.fromstring(html) 
    print (''.join(doc.xpath('.//*[@class="odd"]')[-1].text_content().split())) 
except urllib.error.URLError as e: 
    print(e.reason)