2017-06-22 42 views
1

下面是我用來獲取耐克服裝數據的代碼。在網站上使用python美麗的湯,不斷收到此錯誤:urllib.error.HTTPError:HTTP錯誤403:禁止

import urllib.request 

#Base url for website 
url = 'http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120' 

# A lot of sites don't like the user agents of Python 3, so I specify one here 
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'}) 
html = urllib.request.urlopen(req).read() 

然後錯誤看起來是這樣的:
urllib.error.HTTPError:HTTP錯誤403:禁止

我如何打開和解析這個HTML頁面?

+1

'HTML = urllib.request.urlopen(URL).read()'工作正常 –

回答

1

或嘗試selenium webdriver。

from selenium import webdriver 
from bs4 import BeautifulSoup as bs 

browser = webdriver.Firefox() 
url = 'http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120' 
browser.get(url) 
source = browser.page_source 
soup = bs(source, "html.parser") 
print(soup) 

這個工作對我來說,只是一個新手雖然:)

0

試試這個:

import urllib.request 

class AppURLopener(urllib.request.FancyURLopener): 
    version = "Mozilla/5.0" 

opener = AppURLopener() 
response = opener.open('http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120') 
print(response.read()) 

AppURLopener(從.request.FancyURLopener類繼承)提供了一些很好的工具來模擬瀏覽器,因此繞過403:禁止錯誤。

希望這會有所幫助!

0

或者,您可以嘗試requests

>>> import requests 
>>> page = requests.get('http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120').content 
+0

謝謝比爾! –

0

問題出在User-Agent。該網站會阻止指定的User-Agent,但工作正常,但不在頭中指定任何User-Agent

import urllib.request 

#Base url for website 
url = 'http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120' 

# A lot of sites don't like the user agents of Python 3, so I specify one here 
req = urllib.request.Request(url) 
html = urllib.request.urlopen(req).read() 
print(html) 

但是如果你想添加的頭無論如何,我會建議你使用requests。首先使用 - pip install requests通過pip安裝包裝。

import requests 

#Base url for website 
url = 'http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120' 

# A lot of sites don't like the user agents of Python 3, so I specify one here 
html = requests.get(url, headers = {'User-Agent': 'Mozilla/5.0'}) 
print(html.text) 

詳情文檔有關requests看到這個page

+0

感謝您的幫助! –

+0

@AbhikNag不客氣。順便說一句,如果答案有幫助,爲什麼不接受答案? –