2016-07-30 42 views
0

我想在Python 3.5中打開並解析下面的URL,以收集我的任務的一些註釋。這是我的代碼:Python 3.5無法打開url-錯誤(http 403)

from urllib.request import Request, urlopen 
req = Request ("http://www.webmd.com/drugs/drugreview-35-Zoloft+oral.aspx?drugid=35&drugname=Zoloft+oral&conditionFilter=-500")  
home_page = urlopen(req).read() 
print (home_page) 

這是錯誤:

Traceback (most recent call last): 
     File "/Users/maryamzolnoori/Dropbox/Dissertation/Programming/Web-Crawl/Askapatient_collect_comments.py", line 12, in <module> 
     home_page = urlopen(req).read() 
     File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen 
     return opener.open(url, data, timeout) 
     File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open 
     response = meth(req, response) 
     File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response 
     'http', request, response, code, msg, hdrs) 
     File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error 
     return self._call_chain(*args) 
     File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain 
     result = func(*args) 
     File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default 
     raise HTTPError(req.full_url, code, msg, hdrs, fp) 
    urllib.error.HTTPError: HTTP Error 403: Forbidden 

我測試了它,即使在Python 2.7版,它失敗了。錯誤是:

urllib2.HTTPError: HTTP Error 416: Requested Range Not Satisfiable 

回答

1

你得到一個403禁止,最有可能是由於用戶代理是python。嘗試設置用戶代理,就好像您是瀏覽器一樣。

例如:

from urllib.request import Request, urlopen 
url = "http://www.webmd.com/drugs/drugreview-35-Zoloft+oral.aspx?drugid=35&drugname=Zoloft+oral&conditionFilter=-500" 
req = Request(
    url, 
    data=None, 
    headers={ 
     'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' 
    } 
) 

home_page = urlopen(req) 
print(home_page.read().decode('utf-8')) 

也是一個不錯的主意,用適當的編碼。