在網站上使用python美麗的湯，不斷收到此錯誤：urllib.error.HTTPError：HTTP錯誤403：禁止

下面是我用來獲取耐克服裝數據的代碼。在網站上使用python美麗的湯，不斷收到此錯誤：urllib.error.HTTPError：HTTP錯誤403：禁止

import urllib.request 

#Base url for website 
url = 'http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120' 

# A lot of sites don't like the user agents of Python 3, so I specify one here 
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'}) 
html = urllib.request.urlopen(req).read()

然後錯誤看起來是這樣的：
urllib.error.HTTPError：HTTP錯誤403：禁止

我如何打開和解析這個HTML頁面？

來源

2017-06-22 Abhik Nag

'HTML = urllib.request.urlopen（URL）.read（）'工作正常 –

或嘗試selenium webdriver。

from selenium import webdriver 
from bs4 import BeautifulSoup as bs 

browser = webdriver.Firefox() 
url = 'http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120' 
browser.get(url) 
source = browser.page_source 
soup = bs(source, "html.parser") 
print(soup)

這個工作對我來說，只是一個新手雖然:)

來源

2017-06-22 17:08:51 patrick

試試這個：

import urllib.request 

class AppURLopener(urllib.request.FancyURLopener): 
    version = "Mozilla/5.0" 

opener = AppURLopener() 
response = opener.open('http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120') 
print(response.read())

AppURLopener（從.request.FancyURLopener類繼承）提供了一些很好的工具來模擬瀏覽器，因此繞過403：禁止錯誤。

希望這會有所幫助！

來源

2017-06-22 16:51:04 cosinepenguin

或者，您可以嘗試requests。

>>> import requests 
>>> page = requests.get('http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120').content

來源

2017-06-22 17:01:40

謝謝比爾！ –

問題出在User-Agent。該網站會阻止指定的User-Agent，但工作正常，但不在頭中指定任何User-Agent。

import urllib.request 

#Base url for website 
url = 'http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120' 

# A lot of sites don't like the user agents of Python 3, so I specify one here 
req = urllib.request.Request(url) 
html = urllib.request.urlopen(req).read() 
print(html)

但是如果你想添加的頭無論如何，我會建議你使用requests。首先使用 - pip install requests通過pip安裝包裝。

import requests 

#Base url for website 
url = 'http://store.nike.com/us/en_us/pw/mens-clothing/1mdZ7pu?ipp=120' 

# A lot of sites don't like the user agents of Python 3, so I specify one here 
html = requests.get(url, headers = {'User-Agent': 'Mozilla/5.0'}) 
print(html.text)

詳情文檔有關requests看到這個page。

來源

2017-06-22 17:02:29

感謝您的幫助！ –

@AbhikNag不客氣。順便說一句，如果答案有幫助，爲什麼不接受答案？ –

在網站上使用python美麗的湯，不斷收到此錯誤：urllib.error.HTTPError：HTTP錯誤403：禁止

回答

相關問題